FAQ: Can I Use io.net for AI Model Training?

Quick Answer

Yes, io.net is purpose-built for AI model training and supports all major deep learning frameworks including PyTorch, TensorFlow, JAX, and HuggingFace Transformers. You can train everything from small vision models to 70B+ parameter LLMs using single GPUs or distributed clusters of up to 100+ GPUs. With H100s at $2.20/hr (vs. $6.98/hr on AWS), io.net offers 68% cost savings for training workloads. The platform supports full fine-tuning, LoRA, QLoRA, and distributed training frameworks like DeepSpeed, FSDP, and Ray, with pre-configured containers that reduce setup time from hours to minutes.

What AI Training Workloads Run on io.net

io.net supports the full spectrum of AI training use cases:

Large Language Model Training:
- Full pre-training (7B-70B+ parameters)
- Fine-tuning with custom datasets
- LoRA and QLoRA efficient fine-tuning
- Multi-GPU distributed training with DeepSpeed, FSDP
- Instruction tuning and alignment (RLHF, DPO)

Computer Vision:
- Image classification and object detection (ResNet, YOLO, Vision Transformers)
- Semantic segmentation (U-Net, Mask R-CNN)
- Generative models (Stable Diffusion, GANs)
- Video understanding and generation

Audio and Speech:
- Speech recognition (Whisper, Wav2Vec)
- Text-to-speech synthesis
- Audio generation and music models

Multimodal Models:
- Vision-language models (CLIP, BLIP, LLaVA)
- Text-to-image generation
- Video captioning and understanding

Real Training Performance Benchmarks

Here's how different training workloads perform on io.net GPUs:

Model	Task	GPU	Batch Size	Time to Train	Cost on io.net	Cost on AWS	Savings
Llama 3 8B	LoRA fine-tune (10K samples)	1x A100 80GB	4	6 hours	$7.20	$24.60	71%
Llama 3 8B	Full fine-tune (50K samples)	8x A100 80GB	32	48 hours	$573	$1,574	64%
Llama 3 70B	LoRA fine-tune (10K samples)	4x H100 SXM	2	12 hours	$106	$335	68%
Stable Diffusion XL	Train from scratch (100K images)	4x RTX 4090	64	72 hours	$52	N/A	N/A
ResNet-50	ImageNet training (1.2M images)	8x RTX 4090	256	24 hours	$35	N/A	N/A
Whisper Large	Fine-tune on custom audio	2x L40S	16	18 hours	$27	$54	50%

Benchmarks based on standard training configurations. Actual performance varies by hyperparameters and data pipeline efficiency.

Supported Training Frameworks and Tools

io.net provides pre-configured environments for all major AI frameworks:

Deep Learning Frameworks:
- PyTorch: Full support for PyTorch 2.0+ with compiled mode and FSDP
- TensorFlow: TensorFlow 2.x with XLA acceleration
- JAX: Optimized for large-scale training with pjit and SPMD
- HuggingFace: Transformers, Accelerate, PEFT, TRL pre-installed

Training Optimization:
- DeepSpeed: ZeRO stages 1-3 for memory-efficient training
- FSDP (Fully Sharded Data Parallel): PyTorch native distributed training
- Ray Train: Distributed training orchestration
- Horovod: Multi-GPU and multi-node training
- Flash Attention 2: 2-4x faster attention for transformers

Fine-Tuning Libraries:
- Axolotl: One-config fine-tuning for LLMs
- Unsloth: 2x faster LoRA training with reduced memory
- PEFT (Parameter-Efficient Fine-Tuning): LoRA, QLoRA, prefix tuning
- TRL (Transformer Reinforcement Learning): RLHF and DPO

Pre-configured Containers:

# Launch PyTorch training environment
io launch --gpu A100 --image pytorch/pytorch:2.2.0-cuda12.1-cudnn8-runtime

# Launch HuggingFace fine-tuning environment
io launch --gpu A100 --image huggingface/transformers-pytorch-gpu:latest

# Launch Axolotl for one-config LLM fine-tuning
io launch --gpu H100 --image winglian/axolotl:main-py3.11-cu121-2.2.1

How to Train Your First Model on io.net

Step 1: Launch a GPU instance

# For LoRA fine-tuning a 7B model
io launch --gpu A100 --count 1 --disk 100GB

# For full fine-tuning a 70B model
io launch --gpu H100 --count 8 --disk 500GB --network nvlink

Step 2: Set up your training environment

# Install dependencies
pip install torch transformers accelerate peft datasets wandb

# Load your model and dataset
from transformers import AutoModelForCausalLM, AutoTokenizer
from datasets import load_dataset
from peft import LoraConfig, get_peft_model

model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Llama-3-8B",
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

# Configure LoRA
lora_config = LoraConfig(
    r=16,
    lora_alpha=32,
    target_modules=["q_proj", "v_proj"],
    lora_dropout=0.05,
    bias="none"
)
model = get_peft_model(model, lora_config)

Step 3: Run distributed training (for multi-GPU)

# Using Accelerate for distributed training
accelerate launch --multi_gpu --num_processes 8 train.py \
  --model_name meta-llama/Llama-3-70B \
  --dataset custom_dataset \
  --batch_size 4 \
  --gradient_accumulation 8 \
  --learning_rate 2e-5 \
  --num_epochs 3

# Using DeepSpeed ZeRO-3 for memory efficiency
deepspeed --num_gpus=8 train.py \
  --deepspeed ds_config_zero3.json \
  --model_name meta-llama/Llama-3-70B \
  --per_device_train_batch_size 1 \
  --gradient_checkpointing

Step 4: Monitor training

# Integrate with Weights & Biases for monitoring
import wandb
wandb.init(project="llama-finetuning")

# Training metrics are logged automatically
# View GPU utilization in io.net dashboard

Multi-GPU and Distributed Training

io.net supports scaling from single GPU to 100+ GPU clusters:

Cluster Configuration Options:

Setup	Use Case	Network	GPUs	Cost Example
Single GPU	LoRA fine-tuning, small models	N/A	1x A100	$1.20/hr
2-GPU NVLink	Medium model full fine-tune	NVLink	2x A100	$2.40/hr
8-GPU Node	Large model training (70B)	NVLink/NVSwitch	8x H100	$17.60/hr
Multi-Node	Pre-training, massive datasets	InfiniBand	64x H100	$140.80/hr

Distributed Training Patterns:

# Data Parallel (DP) - Replicate model on each GPU
# Best for: Models that fit on single GPU, large batch sizes
torchrun --nproc_per_node=8 train.py --distributed

# Fully Sharded Data Parallel (FSDP) - Shard model across GPUs
# Best for: Models too large for single GPU (30B+ parameters)
from torch.distributed.fsdp import FullyShardedDataParallel as FSDP
model = FSDP(model, auto_wrap_policy=size_based_auto_wrap_policy)

# Pipeline Parallel - Split model layers across GPUs
# Best for: Extremely large models (100B+), maximize throughput
from torch.distributed.pipeline.sync import Pipe
model = Pipe(model, chunks=8)

# DeepSpeed ZeRO - Memory-optimized distributed training
# Best for: Limited GPU memory, 70B+ models on consumer GPUs
deepspeed train.py --deepspeed ds_config.json

Cost Comparison: Training on io.net vs. Competitors

Llama 3 8B Fine-Tuning (50K examples, 3 epochs):

Provider	GPU	Configuration	Time	Total Cost
io.net	8x A100 80GB	FSDP	48 hrs	$573
AWS	8x A100 80GB	p4d.24xlarge	48 hrs	$1,574
Azure	8x A100 80GB	ND96asr_v4	48 hrs	$1,478
CoreWeave	8x A100 80GB	Reserved	48 hrs	$849
Savings vs. AWS				64%

Stable Diffusion Training (100K images, 50K steps):

Provider	GPU	Configuration	Time	Total Cost
io.net	4x RTX 4090	Data Parallel	72 hrs	$52
RunPod	4x RTX 4090	Spot	72 hrs	$86
Vast.ai	4x RTX 4090	Variable	72 hrs	$72-120
Lambda Labs	4x RTX 4090	(Sold out)	72 hrs	N/A
Savings vs. RunPod				40%

Why io.net is Optimized for AI Training

1. High-Bandwidth GPU Interconnects:
Multi-GPU training requires fast GPU-to-GPU communication. io.net clusters include:
- NVLink: 600 GB/s between GPUs (vs. 64 GB/s PCIe)
- NVSwitch: Full all-to-all connectivity for 8-GPU nodes
- InfiniBand: 200-400 Gbps for multi-node training

2. Fast Storage for Datasets:
Training performance bottlenecks often come from data loading, not GPU compute. io.net provides:
- NVMe SSD storage (6,000+ MB/s read speeds)
- Pre-cached common datasets (ImageNet, Common Crawl, The Pile)
- Direct S3/GCS integration for your custom data

3. Checkpoint and Resume:
Long training runs need fault tolerance. io.net supports:
- Automatic checkpointing every N steps
- Resume from last checkpoint on GPU failure
- Checkpoint storage included (no egress fees)

4. Experiment Tracking Integration:
Pre-integrated with Weights & Biases, TensorBoard, MLflow for tracking:
- Training loss curves
- GPU utilization and memory
- Hyperparameter comparison
- Cost per experiment

5. Instant Scaling:
Start with 1 GPU for experimentation, scale to 8+ GPUs for production training:
- Add GPUs mid-run without restarting
- Auto-scaling based on queue depth
- Pay only for active training time

Common Training Scenarios and Recommendations

Scenario 1: Fine-tuning Llama 3 8B for chatbot
- Recommended GPU: 1x A100 80GB ($1.20/hr)
- Method: LoRA with r=16, 4-bit quantization
- Training time: 4-6 hours on 10K examples
- Total cost: ~$5-7 per experiment

Scenario 2: Training custom Stable Diffusion model
- Recommended GPU: 2x RTX 4090 ($0.36/hr)
- Method: DreamBooth or fine-tuning
- Training time: 12-24 hours on 1K images
- Total cost: ~$4-9 per model

Scenario 3: Full fine-tune Llama 3 70B on proprietary data
- Recommended GPU: 8x H100 SXM ($17.60/hr)
- Method: FSDP + Flash Attention 2 + gradient checkpointing
- Training time: 3-5 days on 100K examples
- Total cost: ~$1,267-2,112 per run

Scenario 4: Pre-training 7B model from scratch
- Recommended GPU: 32x H100 SXM ($70.40/hr)
- Method: DeepSpeed ZeRO-3 + pipeline parallel
- Training time: 2-4 weeks on 300B tokens
- Total cost: ~$23,654-47,309 (vs. $80K+ on AWS)

How long does it take to train a Llama 3 model?

LoRA fine-tuning Llama 3 8B on 10K examples takes 4-6 hours on a single A100 80GB. Full fine-tuning the same model takes 48-72 hours on 8x A100 for 50K examples. Training Llama 3 70B requires 8x H100 and takes 3-5 days for full fine-tuning. For reference, pre-training Llama 3 8B from scratch on 15 trillion tokens would take ~$2M in compute costs.

What's the difference between LoRA and full fine-tuning?

LoRA (Low-Rank Adaptation) fine-tunes only 0.1-1% of model parameters, reducing memory usage by 3-4x and training time by 50-70%. It costs $5-10 per experiment vs. $500-1000 for full fine-tuning. Use LoRA for most use cases (chatbots, domain adaptation, instruction following). Use full fine-tuning only when you need maximum model quality and have 50K+ high-quality training examples.

Can I pause and resume training jobs?

Yes. io.net supports checkpoint-based training where your model state is saved every N steps (configurable). If a GPU fails or you stop the job, you can resume from the last checkpoint without losing progress. For long training runs (72+ hours), enable automatic checkpointing every 500-1000 steps. Checkpoints are stored in persistent storage with no egress fees.

Do I need to manage GPU clusters myself?

No. io.net handles cluster orchestration automatically. When you request 8x H100 GPUs, the platform provisions a cluster with proper networking (NVLink/InfiniBand), configures distributed training frameworks, and handles GPU health monitoring. You just run your training script with standard distributed training commands (torchrun, accelerate launch, deepspeed). For advanced users, Kubernetes deployments are also supported.

What happens if my training job fails mid-run?

io.net automatically detects GPU failures and can either (1) migrate your job to a healthy GPU cluster or (2) resume from the last checkpoint on a new cluster. You're only charged for successful compute time. For fault tolerance, enable checkpointing in your training script and io.net will store checkpoints in persistent storage. Most training runs complete successfully, but for critical multi-day jobs, checkpoint every 1-2 hours.

Start Training on io.net Today

Get 68% cost savings on GPU training compared to AWS:
- GPUs available instantly - H100, A100, RTX 4090, L40S
- Pre-configured environments for PyTorch, TensorFlow, HuggingFace
- Distributed training with NVLink, NVSwitch, InfiniBand
- Per-second billing - pay only for active training time

Browse GPU inventory → or launch your first training job →

Last updated: April 2026 | Training benchmarks based on standard configurations with io.net optimized containers