Yes. io.net provides full Kubernetes support for deploying and managing GPU workloads through native CLI integration. You can create GPU-enabled Kubernetes clusters with automatic GPU scheduling, auto-scaling, and persistent storage—all configured in minutes without manual cluster provisioning or node management.
The io.net Kubernetes implementation supports standard kubectl commands, Helm charts, and popular operators (Kubeflow, Ray, MLflow), while abstracting away the infrastructure complexity. Deploy GPU pods using familiar nvidia.com/gpu resource limits, and io.net automatically provisions the appropriate GPU nodes and manages the cluster lifecycle.
Quick Start: Create GPU Cluster
# Install io.net CLI
pip install ionet-cli
io login
# Create Kubernetes cluster with GPU nodes
io k8s create-cluster \
--name ml-cluster \
--gpu-type A100 \
--min-nodes 2 \
--max-nodes 10 \
--region us-west
# Get kubeconfig
io k8s get-credentials ml-cluster
# Verify cluster
kubectl get nodes
NAME STATUS GPU
ionet-ml-cluster-node-0 Ready A100
ionet-ml-cluster-node-1 Ready A100
# Check GPU resources
kubectl describe node ionet-ml-cluster-node-0 | grep nvidia.com/gpu
nvidia.com/gpu: 1
nvidia.com/gpu: 1
Deploy GPU Workload
Standard Kubernetes GPU pod manifest:
apiVersion: v1
kind: Pod
metadata:
name: pytorch-training
spec:
containers:
- name: pytorch
image: pytorch/pytorch:2.1.0-cuda12.1-cudnn8-runtime
command: ["python", "train.py"]
resources:
limits:
nvidia.com/gpu: 4 # Request 4 GPUs
memory: "128Gi"
volumeMounts:
- name: data
mountPath: /data
- name: output
mountPath: /output
volumes:
- name: data
persistentVolumeClaim:
claimName: training-data
- name: output
persistentVolumeClaim:
claimName: training-output
restartPolicy: Never
Apply to io.net cluster:
kubectl apply -f pytorch-training.yaml
# Monitor pod
kubectl get pods -w
NAME READY STATUS RESTARTS AGE GPU
pytorch-training 1/1 Running 0 45s 4/4
# View logs
kubectl logs pytorch-training -f
# Check GPU utilization
kubectl exec pytorch-training -- nvidia-smi
Auto-Scaling GPU Nodes
Horizontal Pod Autoscaler (HPA):
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: inference-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: llm-inference
minReplicas: 2
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Pods
pods:
metric:
name: request_queue_depth
target:
type: AverageValue
averageValue: "50"
Cluster Autoscaler:
io.net automatically adds/removes nodes based on pod scheduling needs:
# Configure cluster autoscaling
io k8s configure-autoscaler ml-cluster \
--min-nodes 2 \
--max-nodes 50 \
--scale-down-delay 5m \
--scale-up-threshold 80
# Autoscaler behavior:
# - Scale up: When pods are unschedulable (GPU requests exceed capacity)
# - Scale down: When node utilization < 50% for 5+ minutes
# - New node ready: <2 minutes (vs. 5-10 minutes on AWS EKS)
Multi-GPU Distributed Training
PyTorch DDP with Multiple Pods:
apiVersion: kubeflow.org/v1
kind: PyTorchJob
metadata:
name: llama-training
spec:
pytorchReplicaSpecs:
Master:
replicas: 1
restartPolicy: OnFailure
template:
spec:
containers:
- name: pytorch
image: pytorch/pytorch:latest
command:
- python
- -m
- torch.distributed.launch
- --nproc_per_node=4
- train.py
resources:
limits:
nvidia.com/gpu: 4
Worker:
replicas: 3
restartPolicy: OnFailure
template:
spec:
containers:
- name: pytorch
image: pytorch/pytorch:latest
command:
- python
- -m
- torch.distributed.launch
- --nproc_per_node=4
- train.py
resources:
limits:
nvidia.com/gpu: 4
Total: 16 GPUs (1 master × 4 + 3 workers × 4)
Deploy with:
kubectl apply -f pytorchjob.yaml
# Monitor training across all pods
kubectl logs -f llama-training-master-0
kubectl logs -f llama-training-worker-0
Persistent Storage
Create Persistent Volume Claim:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: model-storage
spec:
accessModes:
- ReadWriteMany # Multiple pods can mount
storageClassName: ionet-nvme-ssd
resources:
requests:
storage: 500Gi
Mount in Pod:
spec:
volumes:
- name: models
persistentVolumeClaim:
claimName: model-storage
containers:
- name: app
volumeMounts:
- name: models
mountPath: /models
Storage classes available:
- ionet-nvme-ssd: High-performance NVMe (default)
- ionet-ssd: Standard SSD
- ionet-hdd: High-capacity HDD (archival)
Inference Deployment with Load Balancing
apiVersion: apps/v1
kind: Deployment
metadata:
name: llm-api
spec:
replicas: 3
selector:
matchLabels:
app: llm-api
template:
metadata:
labels:
app: llm-api
spec:
containers:
- name: vllm
image: vllm/vllm-openai:latest
env:
- name: MODEL
value: "meta-llama/Meta-Llama-3-8B-Instruct"
ports:
- containerPort: 8000
resources:
limits:
nvidia.com/gpu: 1
---
apiVersion: v1
kind: Service
metadata:
name: llm-api-service
spec:
type: LoadBalancer
selector:
app: llm-api
ports:
- protocol: TCP
port: 80
targetPort: 8000
Deploy:
kubectl apply -f llm-deployment.yaml
# Get external endpoint
kubectl get service llm-api-service
NAME TYPE EXTERNAL-IP PORT(S) AGE
llm-api-service LoadBalancer xxx.ionet.cloud 80:30123/TCP 2m
# Test API
curl https://xxx.ionet.cloud/v1/models
GPU Scheduling Strategies
Node Affinity (specific GPU types):
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: gpu.ionet.io/type
operator: In
values:
- H100
- A100
Pod Anti-Affinity (distribute across nodes):
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- llm-inference
topologyKey: kubernetes.io/hostname
Kubeflow Integration
Deploy Kubeflow on io.net cluster:
# Install Kubeflow operator
kubectl apply -k "github.com/kubeflow/manifests/apps/pipeline/upstream/env/platform-agnostic"
# Create Jupyter notebook server with GPU
kubectl apply -f - <<EOF
apiVersion: kubeflow.org/v1
kind: Notebook
metadata:
name: gpu-notebook
spec:
template:
spec:
containers:
- name: notebook
image: jupyter/tensorflow-notebook:latest
resources:
limits:
nvidia.com/gpu: 1
volumeMounts:
- name: workspace
mountPath: /home/jovyan
volumes:
- name: workspace
persistentVolumeClaim:
claimName: notebook-pvc
EOF
# Access notebook
kubectl port-forward service/gpu-notebook 8888:8888
# Open http://localhost:8888
Monitoring and Observability
GPU Metrics with Prometheus:
# io.net clusters include GPU metrics exporter by default
kubectl get pods -n kube-system | grep gpu-metrics-exporter
# Query GPU utilization
curl 'http://prometheus.ionet.cloud/api/v1/query?query=gpu_utilization_percent'
# Grafana dashboard
# Pre-configured at https://grafana.xxx.ionet.cloud
# Includes: GPU util, memory, temperature, power, per-pod metrics
Custom Metrics:
apiVersion: v1
kind: ConfigMap
metadata:
name: custom-metrics
data:
config.yaml: |
metrics:
- name: tokens_per_second
type: gauge
help: "Inference throughput"
- name: queue_depth
type: gauge
help: "Request queue depth"
Cost Comparison: io.net vs. AWS EKS
Equivalent Workload (8-node cluster, A100 GPUs, 24/7):
| Component | AWS EKS | io.net K8s | Savings |
|---|---|---|---|
| Control Plane | $72/month | Included | $72 |
| 8x GPU Nodes (A100) | $880/hour | $242/hour | $458/hr |
| Load Balancer | $16/month | Included | $16 |
| Storage (2TB NVMe) | $204/month | $100/month | $104 |
| Data Transfer | $0.09/GB | $0.05/GB | 44% |
| Monthly Total (730 hrs) | $643,000 | $177,000 | $466,000 (72%) |
Helm Charts Support
Deploy popular applications:
# Add io.net Helm repository
helm repo add ionet https://charts.io.net
helm repo update
# Install Ray cluster with GPUs
helm install ray-cluster ionet/ray \
--set head.resources.limits.nvidia\\.com/gpu=1 \
--set worker.replicas=4 \
--set worker.resources.limits.nvidia\\.com/gpu=4
# Install MLflow tracking server
helm install mlflow ionet/mlflow \
--set storage.size=100Gi \
--set postgresql.enabled=true
# Install Argo Workflows for ML pipelines
helm install argo ionet/argo-workflows
Best Practices
- Use resource requests and limits:
yaml resources: requests: nvidia.com/gpu: 1 memory: "16Gi" limits: nvidia.com/gpu: 1 memory: "16Gi" - Enable pod disruption budgets:
yaml apiVersion: policy/v1 kind: PodDisruptionBudget metadata: name: llm-api-pdb spec: minAvailable: 2 selector: matchLabels: app: llm-api - Use readiness probes for inference:
yaml readinessProbe: httpGet: path: /health port: 8000 initialDelaySeconds: 60 periodSeconds: 10 - Tag nodes for GPU types:
bash kubectl label nodes ionet-node-1 gpu-type=H100 kubectl label nodes ionet-node-2 gpu-type=A100
Deploy Kubernetes GPU clusters on io.net with 70% cost savings vs. AWS EKS.
