Renting GPUs in the cloud enables AI development without $300K capital expenditures on hardware. Whether you're training language models, fine-tuning computer vision systems, or running batch inference, cloud GPU rental provides instant access to H100, A100, and other high-performance hardware at hourly rates. This beginner-friendly tutorial walks through the complete process: choosing a provider, selecting GPU types, deploying your first cluster, running training jobs, and optimizing costs.
Why Rent Cloud GPUs Instead of Buying?
Capital efficiency: $0 upfront vs $300K for DGX H100
Instant scalability: 1 to 100+ GPUs in minutes
Latest hardware: Access H100, future H200 without purchase cycles
Elastic costs: Pay only when using, scale to zero when idle
Global deployment: Train in US, Europe, Asia without building data centers
No maintenance: Provider handles power, cooling, hardware failures
Step 1: Choose Your Cloud GPU Provider
Beginner Recommendation: io.net
Why io.net for first-time renters:
- Simplest onboarding (5 minutes signup → running GPUs)
- Cheapest pricing (70% less than AWS)
- Instant availability (no waitlists or quota requests)
- Pre-configured containers (no CUDA installation needed)
- $100 free trial (test before committing)
Pricing: $4/hr for H100, $2.50/hr for A100, $1/hr for RTX 4090
Alternative Providers
AWS EC2: Most mature but 3x more expensive, complex setup
GCP Compute Engine: Competitive with AWS pricing, limited H100 availability
Azure Virtual Machines: Enterprise-friendly, similar pricing to AWS/GCP
For beginners: Start with io.net for lowest cost and fastest setup. Graduate to AWS/GCP if you need their managed services later.
Step 2: Select the Right GPU Type
GPU Selection Matrix
For fine-tuning models <7B parameters:
- Best: RTX 4090 ($1/hr) - Sufficient performance, lowest cost
- Good: A100 40GB ($1.80/hr) - More memory if needed
For training models 7-70B parameters:
- Best: A100 80GB ($2.50/hr) - Sweet spot price/performance
- Fast: H100 ($4/hr) - 3x faster, worth it for time-sensitive projects
For foundation models >70B parameters:
- Required: H100 SXM ($4/hr) - Only viable option for reasonable training times
For inference:
- Low volume: RTX 4090 ($1/hr)
- High volume: H100 ($4/hr) for 3x throughput
How Many GPUs Do You Need?
Single GPU (1-8 GPUs):
- Fine-tuning small models
- Development and experimentation
- Inference workloads
Multi-GPU cluster (8-64 GPUs):
- Training medium-large models (7-70B params)
- Distributed training for speed
- High-throughput inference
Large cluster (64+ GPUs):
- Foundation model training
- Production training pipelines
Step 3: Sign Up and Add Credits (io.net Example)
3a. Create Account
- Visit cloud.io.net
- Click "Sign Up"
- Enter email and password
- Verify email (check inbox for confirmation link)
Time: 2 minutes
3b. Add Credits
Payment options:
- Credit card: Visa, Mastercard, Amex (instant)
- Cryptocurrency: USDC, USDT, ETH, BTC (5-15 min confirmation)
- Free trial: $100 credits, no card required
Recommended: Start with free $100 trial to test platform
Time: 2-5 minutes depending on payment method
Step 4: Deploy Your First GPU Cluster
Method A: Web Dashboard (Easiest)
Step 4a-1: Navigate to cloud.io.net/clusters
Step 4a-2: Click "New Cluster"
Step 4a-3: Configure cluster
- Name: my-first-cluster
- GPU Type: H100 SXM (dropdown menu)
- Quantity: 1 (start small to test)
- Region: Auto (or select preferred location)
Step 4a-4: Click "Launch Cluster"
Result: Cluster provisions in <2 minutes. Dashboard shows:
- Cluster status: Running
- GPU utilization: 0% (waiting for workload)
- Connection details: SSH command, kubectl config
- Current cost: $4/hr
Time: 5 minutes total
Method B: Command Line (Faster for Repeat Use)
# Install io.net CLI
pip install ionet-cli
# Authenticate
ionet login
# Enter your email/password
# Deploy cluster
ionet cluster create \
--name my-first-cluster \
--gpu h100-sxm \
--count 1
# Wait ~90 seconds for provisioning
# Output shows cluster ID and connection info
Time: 3 minutes total
Step 5: Access Your GPU Cluster
Method A: SSH Access
# Get SSH command from dashboard or:
ionet cluster ssh my-first-cluster
# You're now connected to GPU server
# Verify GPUs available:
nvidia-smi
# Output shows:
# GPU 0: NVIDIA H100 80GB
# Memory: 80GB / 80GB
Method B: Jupyter Notebook
# Launch Jupyter on cluster
ionet cluster jupyter my-first-cluster
# Opens browser with Jupyter interface
# Start coding immediately
Method C: Kubernetes (Advanced)
# Get kubeconfig
ionet cluster kubeconfig my-first-cluster > ~/.kube/ionet-config
# Deploy via kubectl
kubectl apply -f my-training-job.yaml
Step 6: Run Your First Training Job
Example: Fine-Tune LLaMA 2 7B
6a. Prepare Training Script
# train.py
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, Trainer
# Load model
model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-7b-hf")
model = model.cuda() # Move to GPU
# Load dataset
# ... (your data loading code)
# Train
trainer = Trainer(model=model, ...)
trainer.train()
# Save
model.save_pretrained("./fine-tuned-llama")
6b. Run Training
# SSH into cluster
ionet cluster ssh my-first-cluster
# Install dependencies
pip install transformers torch
# Run training
python train.py
# Monitor GPU utilization
nvidia-smi dmon -s u
6c. Monitor Progress
# In separate terminal, check costs
ionet billing summary
# Shows:
# Current spend: $12.50 (3.1 hours × $4/hr)
# Projected monthly: $2,880 (if running 24/7)
Step 7: Retrieve Your Trained Model
Option 1: Direct Download via SSH
# On your local machine
scp -r ionet-cluster:/path/to/fine-tuned-llama ./
Option 2: Upload to S3 from Cluster
# On cluster
pip install boto3
python upload_to_s3.py # Your upload script
# No egress fees with io.net (unlike AWS)
Option 3: Keep on Cluster Storage
# Models persist on cluster storage
# Access them next time you scale up cluster
Step 8: Shut Down to Stop Charges
IMPORTANT: GPU clusters charge per hour while running
Method A: Delete Cluster
ionet cluster delete my-first-cluster
# Charges stop immediately
# Data is saved and can be restored
Method B: Scale to Zero
ionet cluster scale my-first-cluster --count 0
# $0/hr charges while scaled to zero
# Scale back up instantly when needed
Cost impact:
- Forgot to shut down: $4/hr × 720 hrs/month = $2,880 wasted
- Remember to shut down: $4/hr × 20 hrs actual use = $80

Common Beginner Mistakes (and How to Avoid)
Mistake 1: Forgetting to Shut Down
Symptom: Unexpected $2,880 monthly bill
Solution: Set budget alerts
ionet budget set --limit 100 --alert-at 80%
# Email alert at $80 spend
Mistake 2: Choosing Wrong GPU Type
Symptom: Fine-tuning 7B model on H100 ($4/hr) when RTX 4090 ($1/hr) would work
Solution: Start small, upgrade if needed
Mistake 3: Not Using Mixed Precision
Symptom: Training takes 20 hours instead of 10 hours (2x cost)
Solution:
from torch.cuda.amp import autocast
with autocast():
output = model(input)
Impact: 2x faster = 50% cost savings
Mistake 4: Downloading Large Datasets Repeatedly
Symptom: Wasting GPU time downloading 100GB dataset every training run
Solution: Keep datasets on cluster storage or use S3 directly
Mistake 5: Over-Provisioning GPUs
Symptom: Using 8 GPUs when 4 would suffice (2x cost for minimal speedup)
Solution: Benchmark scaling efficiency before large deployments
Cost Optimization Tips
Tip 1: Use Cheapest Sufficient GPU
RTX 4090 ($1/hr) often sufficient for fine-tuning vs H100 ($4/hr) = 75% savings
Tip 2: Enable FP16/BF16 Mixed Precision
2x faster training = 50% cost reduction
Tip 3: Batch Multiple Experiments
Run 5 experiments back-to-back on same cluster vs spinning up/down 5 times = save setup overhead
Tip 4: Use Spot/Preemptible for Fault-Tolerant Jobs
AWS spot H100 is $45-60/hr (vs io.net $4/hr standard). io.net standard cheaper than AWS spot!
Tip 5: Scale to Zero Between Experiments
Pay only during active training, not during result analysis
Troubleshooting
Issue: Out of Memory Error
Solution: Reduce batch size or use gradient accumulation
# Reduce batch size
train_loader = DataLoader(dataset, batch_size=16) # Was 32
# Or use gradient accumulation
for i, batch in enumerate(dataloader):
loss.backward()
if (i + 1) % 4 == 0: # Accumulate 4 steps
optimizer.step()
Issue: Slow Training
Diagnosis: Check GPU utilization
nvidia-smi dmon -s u
# If <70%, you have bottleneck
Common causes:
- Data loading too slow → increase
num_workers - Small batch size → increase batch size
- CPU preprocessing → move to GPU
Issue: Connection Lost
Solution: Training continues in background. Reconnect via SSH
# Use tmux or screen to persist sessions
tmux new -s training
python train.py
# Detach: Ctrl+b, then d
# Reattach later: tmux attach -t training
Next Steps After First GPU Rental
Beginner → Intermediate
Learn:
- Distributed training across multiple GPUs
- Container-based deployment (Docker)
- Checkpoint management and resume training
- Experiment tracking (Weights & Biases, MLflow)
Practice:
- Train progressively larger models (7B → 13B → 70B)
- Optimize training efficiency (FP16, gradient accumulation, Flash Attention)
- Benchmark GPU scaling (1 vs 4 vs 8 GPUs)
Intermediate → Advanced
Learn:
- Multi-node distributed training (64+ GPUs)
- Custom CUDA kernels for optimization
- Production deployment pipelines
- Cost-optimized hybrid cloud architectures
Practice:
- Train foundation models from scratch
- Implement efficient inference systems
- Build MLOps automation (CI/CD for models)
Conclusion
Renting GPUs in the cloud is straightforward with the right provider. io.net offers the simplest onboarding (5 minutes to running GPUs), cheapest pricing (70% less than AWS), and instant availability—ideal for first-time renters.
Quick start checklist:
- Sign up at cloud.io.net (2 min)
- Deploy H100 cluster (1 min)
- Run training job (your code)
- Shut down when done (critical!)
Total time: <10 minutes from zero to training on H100 GPUs
Ready to rent your first cloud GPU?
→ Quickstart guide - Video tutorial
→ Community Discord - Get help from 5K+ users