Yes. io.net provides full Kubernetes support for deploying and managing GPU workloads through native CLI integration. You can create GPU-enabled Kubernetes clusters with automatic GPU scheduling, auto-scaling, and persistent storage—all configured in minutes without manual cluster provisioning or node management.

The io.net Kubernetes implementation supports standard kubectl commands, Helm charts, and popular operators (Kubeflow, Ray, MLflow), while abstracting away the infrastructure complexity. Deploy GPU pods using familiar nvidia.com/gpu resource limits, and io.net automatically provisions the appropriate GPU nodes and manages the cluster lifecycle.

Quick Start: Create GPU Cluster

# Install io.net CLI
pip install ionet-cli
io login

# Create Kubernetes cluster with GPU nodes
io k8s create-cluster \
  --name ml-cluster \
  --gpu-type A100 \
  --min-nodes 2 \
  --max-nodes 10 \
  --region us-west

# Get kubeconfig
io k8s get-credentials ml-cluster

# Verify cluster
kubectl get nodes
NAME                         STATUS   GPU
ionet-ml-cluster-node-0     Ready    A100
ionet-ml-cluster-node-1     Ready    A100

# Check GPU resources
kubectl describe node ionet-ml-cluster-node-0 | grep nvidia.com/gpu
  nvidia.com/gpu:     1
  nvidia.com/gpu:     1

Deploy GPU Workload

Standard Kubernetes GPU pod manifest:

apiVersion: v1
kind: Pod
metadata:
  name: pytorch-training
spec:
  containers:
  - name: pytorch
    image: pytorch/pytorch:2.1.0-cuda12.1-cudnn8-runtime
    command: ["python", "train.py"]
    resources:
      limits:
        nvidia.com/gpu: 4  # Request 4 GPUs
        memory: "128Gi"
    volumeMounts:
    - name: data
      mountPath: /data
    - name: output
      mountPath: /output
  volumes:
  - name: data
    persistentVolumeClaim:
      claimName: training-data
  - name: output
    persistentVolumeClaim:
      claimName: training-output
  restartPolicy: Never

Apply to io.net cluster:

kubectl apply -f pytorch-training.yaml

# Monitor pod
kubectl get pods -w
NAME               READY   STATUS    RESTARTS   AGE   GPU
pytorch-training   1/1     Running   0          45s   4/4

# View logs
kubectl logs pytorch-training -f

# Check GPU utilization
kubectl exec pytorch-training -- nvidia-smi

Auto-Scaling GPU Nodes

Horizontal Pod Autoscaler (HPA):

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: inference-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: llm-inference
  minReplicas: 2
  maxReplicas: 20
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Pods
    pods:
      metric:
        name: request_queue_depth
      target:
        type: AverageValue
        averageValue: "50"

Cluster Autoscaler:

io.net automatically adds/removes nodes based on pod scheduling needs:

# Configure cluster autoscaling
io k8s configure-autoscaler ml-cluster \
  --min-nodes 2 \
  --max-nodes 50 \
  --scale-down-delay 5m \
  --scale-up-threshold 80

# Autoscaler behavior:
# - Scale up: When pods are unschedulable (GPU requests exceed capacity)
# - Scale down: When node utilization < 50% for 5+ minutes
# - New node ready: <2 minutes (vs. 5-10 minutes on AWS EKS)

Multi-GPU Distributed Training

PyTorch DDP with Multiple Pods:

apiVersion: kubeflow.org/v1
kind: PyTorchJob
metadata:
  name: llama-training
spec:
  pytorchReplicaSpecs:
    Master:
      replicas: 1
      restartPolicy: OnFailure
      template:
        spec:
          containers:
          - name: pytorch
            image: pytorch/pytorch:latest
            command:
              - python
              - -m
              - torch.distributed.launch
              - --nproc_per_node=4
              - train.py
            resources:
              limits:
                nvidia.com/gpu: 4
    Worker:
      replicas: 3
      restartPolicy: OnFailure
      template:
        spec:
          containers:
          - name: pytorch
            image: pytorch/pytorch:latest
            command:
              - python
              - -m
              - torch.distributed.launch
              - --nproc_per_node=4
              - train.py
            resources:
              limits:
                nvidia.com/gpu: 4

Total: 16 GPUs (1 master × 4 + 3 workers × 4)

Deploy with:

kubectl apply -f pytorchjob.yaml

# Monitor training across all pods
kubectl logs -f llama-training-master-0
kubectl logs -f llama-training-worker-0

Persistent Storage

Create Persistent Volume Claim:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: model-storage
spec:
  accessModes:
    - ReadWriteMany  # Multiple pods can mount
  storageClassName: ionet-nvme-ssd
  resources:
    requests:
      storage: 500Gi

Mount in Pod:

spec:
  volumes:
  - name: models
    persistentVolumeClaim:
      claimName: model-storage
  containers:
  - name: app
    volumeMounts:
    - name: models
      mountPath: /models

Storage classes available:
ionet-nvme-ssd: High-performance NVMe (default)
ionet-ssd: Standard SSD
ionet-hdd: High-capacity HDD (archival)

Inference Deployment with Load Balancing

apiVersion: apps/v1
kind: Deployment
metadata:
  name: llm-api
spec:
  replicas: 3
  selector:
    matchLabels:
      app: llm-api
  template:
    metadata:
      labels:
        app: llm-api
    spec:
      containers:
      - name: vllm
        image: vllm/vllm-openai:latest
        env:
        - name: MODEL
          value: "meta-llama/Meta-Llama-3-8B-Instruct"
        ports:
        - containerPort: 8000
        resources:
          limits:
            nvidia.com/gpu: 1
---
apiVersion: v1
kind: Service
metadata:
  name: llm-api-service
spec:
  type: LoadBalancer
  selector:
    app: llm-api
  ports:
  - protocol: TCP
    port: 80
    targetPort: 8000

Deploy:

kubectl apply -f llm-deployment.yaml

# Get external endpoint
kubectl get service llm-api-service
NAME              TYPE           EXTERNAL-IP        PORT(S)        AGE
llm-api-service   LoadBalancer   xxx.ionet.cloud   80:30123/TCP   2m

# Test API
curl https://xxx.ionet.cloud/v1/models

GPU Scheduling Strategies

Node Affinity (specific GPU types):

spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: gpu.ionet.io/type
            operator: In
            values:
            - H100
            - A100

Pod Anti-Affinity (distribute across nodes):

spec:
  affinity:
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchExpressions:
          - key: app
            operator: In
            values:
            - llm-inference
        topologyKey: kubernetes.io/hostname

Kubeflow Integration

Deploy Kubeflow on io.net cluster:

# Install Kubeflow operator
kubectl apply -k "github.com/kubeflow/manifests/apps/pipeline/upstream/env/platform-agnostic"

# Create Jupyter notebook server with GPU
kubectl apply -f - <<EOF
apiVersion: kubeflow.org/v1
kind: Notebook
metadata:
  name: gpu-notebook
spec:
  template:
    spec:
      containers:
      - name: notebook
        image: jupyter/tensorflow-notebook:latest
        resources:
          limits:
            nvidia.com/gpu: 1
        volumeMounts:
        - name: workspace
          mountPath: /home/jovyan
      volumes:
      - name: workspace
        persistentVolumeClaim:
          claimName: notebook-pvc
EOF

# Access notebook
kubectl port-forward service/gpu-notebook 8888:8888
# Open http://localhost:8888

Monitoring and Observability

GPU Metrics with Prometheus:

# io.net clusters include GPU metrics exporter by default
kubectl get pods -n kube-system | grep gpu-metrics-exporter

# Query GPU utilization
curl 'http://prometheus.ionet.cloud/api/v1/query?query=gpu_utilization_percent'

# Grafana dashboard
# Pre-configured at https://grafana.xxx.ionet.cloud
# Includes: GPU util, memory, temperature, power, per-pod metrics

Custom Metrics:

apiVersion: v1
kind: ConfigMap
metadata:
  name: custom-metrics
data:
  config.yaml: |
    metrics:
    - name: tokens_per_second
      type: gauge
      help: "Inference throughput"
    - name: queue_depth
      type: gauge
      help: "Request queue depth"

Cost Comparison: io.net vs. AWS EKS

Equivalent Workload (8-node cluster, A100 GPUs, 24/7):

ComponentAWS EKSio.net K8sSavings
Control Plane$72/monthIncluded$72
8x GPU Nodes (A100)$880/hour$242/hour$458/hr
Load Balancer$16/monthIncluded$16
Storage (2TB NVMe)$204/month$100/month$104
Data Transfer$0.09/GB$0.05/GB44%
Monthly Total (730 hrs)$643,000$177,000$466,000 (72%)

Helm Charts Support

Deploy popular applications:

# Add io.net Helm repository
helm repo add ionet https://charts.io.net
helm repo update

# Install Ray cluster with GPUs
helm install ray-cluster ionet/ray \
  --set head.resources.limits.nvidia\\.com/gpu=1 \
  --set worker.replicas=4 \
  --set worker.resources.limits.nvidia\\.com/gpu=4

# Install MLflow tracking server
helm install mlflow ionet/mlflow \
  --set storage.size=100Gi \
  --set postgresql.enabled=true

# Install Argo Workflows for ML pipelines
helm install argo ionet/argo-workflows

Best Practices

  1. Use resource requests and limits:
    yaml resources: requests: nvidia.com/gpu: 1 memory: "16Gi" limits: nvidia.com/gpu: 1 memory: "16Gi"
  2. Enable pod disruption budgets:
    yaml apiVersion: policy/v1 kind: PodDisruptionBudget metadata: name: llm-api-pdb spec: minAvailable: 2 selector: matchLabels: app: llm-api
  3. Use readiness probes for inference:
    yaml readinessProbe: httpGet: path: /health port: 8000 initialDelaySeconds: 60 periodSeconds: 10
  4. Tag nodes for GPU types:
    bash kubectl label nodes ionet-node-1 gpu-type=H100 kubectl label nodes ionet-node-2 gpu-type=A100

Deploy Kubernetes GPU clusters on io.net with 70% cost savings vs. AWS EKS.