FAQ: Can I migrate from AWS to io.net?

Yes. Migrating GPU workloads from AWS (EC2, SageMaker, EKS) to io.net is straightforward and typically takes 1-3 days for most deployments. The process involves containerizing your workload (if not already), transferring data, updating endpoint URLs, and redeploying on io.net's infrastructure. Organizations save 50-70% on GPU costs while maintaining comparable performance and reliability.

io.net supports the same GPU types as AWS (H100, A100, A10G), runs standard Docker containers, and provides equivalent networking and storage primitives. The migration path is designed for zero disruption: run workloads in parallel on both platforms during testing, then cut over when validated.

Migration Complexity by Workload Type

Workload Type	Complexity	Migration Time	Key Considerations
Containerized ML training	Low	1-2 hours	Direct port, minimal changes
Inference API (containerized)	Low	2-4 hours	Update DNS, load test
SageMaker training jobs	Medium	1-2 days	Convert to container, adapt dataset loading
EC2 instances (manual setup)	Medium	2-3 days	Containerize environment, document dependencies
EKS GPU clusters	Medium-High	3-5 days	Port Kubernetes manifests, test scaling
Batch processing pipelines	Low-Medium	1-2 days	Adapt job scheduler, validate outputs

Step-by-Step Migration Guide

Phase 1: Assessment (1-2 hours)

Inventory AWS GPU usage:

# List all EC2 GPU instances
aws ec2 describe-instances \
  --filters "Name=instance-type,Values=p*,g*" \
  --query 'Reservations[].Instances[].[InstanceId,InstanceType,State.Name]'

# Get SageMaker training jobs (last 30 days)
aws sagemaker list-training-jobs \
  --max-results 100 \
  --creation-time-after $(date -d '30 days ago' +%Y-%m-%d)

# Estimate monthly GPU costs
aws ce get-cost-and-usage \
  --time-period Start=2026-03-01,End=2026-04-01 \
  --granularity MONTHLY \
  --filter file://gpu-filter.json \
  --metrics BlendedCost

Calculate potential savings:

AWS Cost Example (p4d.24xlarge with 8x A100):
- On-demand: $32.77/hour
- 12 hours/day × 30 days = 360 hours/month
- Monthly cost: $11,797

io.net Equivalent (8x A100):
- On-demand: $8.80/hour
- Same usage: 360 hours/month
- Monthly cost: $3,168
- Savings: $8,629/month (73%)

Identify dependencies:
- [ ] AWS-specific services (S3, EBS, VPC, IAM)
- [ ] Custom AMIs or EC2 user data scripts
- [ ] Security groups and networking configurations
- [ ] Monitoring and logging (CloudWatch)
- [ ] Data sources (RDS, DynamoDB, S3)

Phase 2: Containerization (4-8 hours if needed)

If workload isn't containerized:

# Example: Convert EC2 PyTorch environment to Docker
FROM pytorch/pytorch:2.1.0-cuda12.1-cudnn8-runtime

# Install dependencies from requirements.txt
COPY requirements.txt /workspace/
RUN pip install -r /workspace/requirements.txt

# Copy application code
COPY ./src /workspace/src
COPY ./models /workspace/models

# Set working directory
WORKDIR /workspace

# Entry point
CMD ["python", "src/train.py"]

Build and test locally:

docker build -t my-training-job:latest .
docker run --gpus all -it my-training-job:latest

Phase 3: Data Migration (varies by dataset size)

Option A: Direct Transfer (< 1TB)

# Upload to io.net volume from AWS S3
aws s3 cp s3://my-bucket/dataset.tar.gz - | \
  io exec --instance my-gpu -- tar xzf - -C /data/

Option B: Parallel Transfer (1TB+)

# Use io.net S3-compatible storage
io storage create --name my-dataset --size 2TB

# Multi-threaded sync from AWS S3
s5cmd --numworkers 32 cp \
  s3://my-aws-bucket/* \
  https://storage.io.net/my-dataset/

Option C: Dataset Streaming

# Stream from S3 during training (no migration needed)
import boto3
from torch.utils.data import IterableDataset

class S3Dataset(IterableDataset):
    def __init__(self, bucket, prefix):
        self.s3 = boto3.client('s3')
        self.bucket = bucket
        self.prefix = prefix

    def __iter__(self):
        # Stream data directly from S3
        for obj in self.s3.list_objects_v2(Bucket=self.bucket, Prefix=self.prefix):
            data = self.s3.get_object(Bucket=self.bucket, Key=obj['Key'])
            yield process(data['Body'].read())

# Works identically on io.net (AWS credentials remain valid)

Phase 4: Deploy on io.net (1-2 hours)

# 1. Install io.net CLI
pip install ionet-cli
io login

# 2. Deploy training job
io deploy --image my-training-job:latest \
  --gpu A100 --count 8 \
  --memory 480GB \
  --storage 1TB \
  --env AWS_ACCESS_KEY_ID=$AWS_ACCESS_KEY_ID \
  --env AWS_SECRET_ACCESS_KEY=$AWS_SECRET_ACCESS_KEY \
  --name training-job

# 3. Monitor deployment
io logs --instance training-job --follow

# 4. Check GPU utilization
io exec --instance training-job -- nvidia-smi

Phase 5: Validation (2-4 hours)

# Run parallel test: AWS vs. io.net
# Compare:
# - Training throughput (samples/sec)
# - Final model accuracy
# - Total training time
# - Network latency to data sources

# Example validation script
python validate_migration.py \
  --aws-model s3://aws-bucket/model.pth \
  --ionet-model https://storage.io.net/my-dataset/model.pth \
  --test-dataset s3://test-data/ \
  --metrics accuracy,f1_score

Phase 6: Cutover (1 hour)

# For inference workloads:
# 1. Deploy on io.net
io deploy --image my-api:latest --gpu A100 --port 8000

# 2. Update DNS to point to io.net endpoint
# AWS Route53 or CloudFlare
# Old: api.example.com → AWS Load Balancer
# New: api.example.com → io.net endpoint (xxx.ionet.cloud)

# 3. Monitor traffic and error rates
# 4. Gradually shift traffic (10% → 50% → 100%)
# 5. Decommission AWS resources after 7-day validation

Common Migration Scenarios

Scenario 1: SageMaker Training Job → io.net

AWS SageMaker:

import sagemaker
from sagemaker.pytorch import PyTorch

estimator = PyTorch(
    entry_point='train.py',
    role='arn:aws:iam::xxx:role/SageMakerRole',
    instance_type='ml.p4d.24xlarge',
    instance_count=1,
    framework_version='2.0',
    py_version='py310'
)

estimator.fit('s3://my-bucket/data')

io.net equivalent:

# Containerize SageMaker script
docker build -t sagemaker-port:latest \
  -f Dockerfile.sagemaker .

# Deploy on io.net
io deploy --image sagemaker-port:latest \
  --gpu A100 --count 8 \
  --env S3_BUCKET=my-bucket \
  --env S3_PREFIX=data

Scenario 2: EKS GPU Cluster → io.net

AWS EKS manifest:

apiVersion: v1
kind: Pod
metadata:
  name: gpu-pod
spec:
  containers:
  - name: pytorch
    image: pytorch/pytorch:latest
    resources:
      limits:
        nvidia.com/gpu: 4

io.net equivalent:

# io.net has Kubernetes support
io k8s create-cluster --name my-cluster

# Apply same manifests
kubectl --context ionet apply -f gpu-pod.yaml

# Or use simplified CLI
io deploy --image pytorch/pytorch:latest --gpu A100 --count 4

Scenario 3: EC2 Inference API → io.net

AWS setup:

EC2 (p3.2xlarge with 1x V100) → ELB → Route53
- Instance: $3.06/hour
- Load balancer: $16/month
- Data transfer: $0.09/GB
Monthly cost: ~$2,250 (730 hours)

io.net setup:

io deploy --image my-api:latest \
  --gpu A100 --replicas 2 \
  --autoscale min=1,max=5 \
  --port 443 \
  --domain api.example.com

# Built-in load balancing, auto-scaling, HTTPS
# Cost: $1.10/hour × 730 hours = $803/month
# Savings: $1,447/month (64%)

AWS-Specific Service Replacements

AWS Service	io.net Equivalent	Notes
EC2 P/G instances	io.net GPU instances	Direct replacement
SageMaker Training	Containerized training	Convert to Docker
SageMaker Inference	io.net deployment + vLLM	API-compatible
S3	S3 (access directly) or io.net storage	AWS creds still work
EBS volumes	io.net persistent storage	NVMe SSD, similar performance
VPC	io.net private networking	Isolated networks per deployment
CloudWatch	io.net dashboard + Prometheus	Metrics API available
IAM	io.net RBAC	Team-based access control
ELB	Built-in load balancing	Automatic with replicas

Migration Checklist

[ ] Audit current AWS GPU usage (instance types, hours, costs)
[ ] Calculate io.net savings (use pricing calculator)
[ ] Containerize workloads (if not already Docker-based)
[ ] Identify data dependencies (S3, databases, APIs)
[ ] Plan data migration (streaming vs. one-time transfer)
[ ] Deploy test workload on io.net (validate performance)
[ ] Run parallel for 7 days (compare metrics side-by-side)
[ ] Update DNS/endpoints (cutover to io.net)
[ ] Monitor for 14 days (ensure stability)
[ ] Decommission AWS resources (terminate instances, clean up)

Performance Comparison: AWS vs. io.net

Workload	AWS Config	AWS Cost	io.net Config	io.net Cost	Performance Difference
Llama 3 70B training	8x A100 (p4d.24xlarge)	$32.77/hr	8x A100	$8.80/hr	<5% (comparable)
Stable Diffusion API	1x A10G (g5.2xlarge)	$1.21/hr	1x RTX 4090	$0.18/hr	+15% (faster)
Batch inference	4x V100 (p3.8xlarge)	$12.24/hr	4x A100	$4.40/hr	+80% (much faster)

Ready to migrate? Start on io.net and see 50-70% cost savings immediately.