Renting GPUs in the cloud enables AI development without $300K capital expenditures on hardware. Whether you're training language models, fine-tuning computer vision systems, or running batch inference, cloud GPU rental provides instant access to H100, A100, and other high-performance hardware at hourly rates. This beginner-friendly tutorial walks through the complete process: choosing a provider, selecting GPU types, deploying your first cluster, running training jobs, and optimizing costs.

Why Rent Cloud GPUs Instead of Buying?

Capital efficiency: $0 upfront vs $300K for DGX H100

Instant scalability: 1 to 100+ GPUs in minutes

Latest hardware: Access H100, future H200 without purchase cycles

Elastic costs: Pay only when using, scale to zero when idle

Global deployment: Train in US, Europe, Asia without building data centers

No maintenance: Provider handles power, cooling, hardware failures

Step 1: Choose Your Cloud GPU Provider

Beginner Recommendation: io.net

Why io.net for first-time renters:

  • Simplest onboarding (5 minutes signup → running GPUs)
  • Cheapest pricing (70% less than AWS)
  • Instant availability (no waitlists or quota requests)
  • Pre-configured containers (no CUDA installation needed)
  • $100 free trial (test before committing)

Pricing: $4/hr for H100, $2.50/hr for A100, $1/hr for RTX 4090

Alternative Providers

AWS EC2: Most mature but 3x more expensive, complex setup
GCP Compute Engine: Competitive with AWS pricing, limited H100 availability
Azure Virtual Machines: Enterprise-friendly, similar pricing to AWS/GCP

For beginners: Start with io.net for lowest cost and fastest setup. Graduate to AWS/GCP if you need their managed services later.

Step 2: Select the Right GPU Type

GPU Selection Matrix

For fine-tuning models <7B parameters:

  • Best: RTX 4090 ($1/hr) - Sufficient performance, lowest cost
  • Good: A100 40GB ($1.80/hr) - More memory if needed

For training models 7-70B parameters:

  • Best: A100 80GB ($2.50/hr) - Sweet spot price/performance
  • Fast: H100 ($4/hr) - 3x faster, worth it for time-sensitive projects

For foundation models >70B parameters:

  • Required: H100 SXM ($4/hr) - Only viable option for reasonable training times

For inference:

  • Low volume: RTX 4090 ($1/hr)
  • High volume: H100 ($4/hr) for 3x throughput

How Many GPUs Do You Need?

Single GPU (1-8 GPUs):

  • Fine-tuning small models
  • Development and experimentation
  • Inference workloads

Multi-GPU cluster (8-64 GPUs):

  • Training medium-large models (7-70B params)
  • Distributed training for speed
  • High-throughput inference

Large cluster (64+ GPUs):

  • Foundation model training
  • Production training pipelines

Step 3: Sign Up and Add Credits (io.net Example)

3a. Create Account

  1. Visit cloud.io.net
  2. Click "Sign Up"
  3. Enter email and password
  4. Verify email (check inbox for confirmation link)

Time: 2 minutes

3b. Add Credits

Payment options:

  • Credit card: Visa, Mastercard, Amex (instant)
  • Cryptocurrency: USDC, USDT, ETH, BTC (5-15 min confirmation)
  • Free trial: $100 credits, no card required

Recommended: Start with free $100 trial to test platform

Time: 2-5 minutes depending on payment method

Step 4: Deploy Your First GPU Cluster

Method A: Web Dashboard (Easiest)

Step 4a-1: Navigate to cloud.io.net/clusters

Step 4a-2: Click "New Cluster"

Step 4a-3: Configure cluster

  • Name: my-first-cluster
  • GPU Type: H100 SXM (dropdown menu)
  • Quantity: 1 (start small to test)
  • Region: Auto (or select preferred location)

Step 4a-4: Click "Launch Cluster"

Result: Cluster provisions in <2 minutes. Dashboard shows:

  • Cluster status: Running
  • GPU utilization: 0% (waiting for workload)
  • Connection details: SSH command, kubectl config
  • Current cost: $4/hr

Time: 5 minutes total

Method B: Command Line (Faster for Repeat Use)

# Install io.net CLI
pip install ionet-cli

# Authenticate
ionet login
# Enter your email/password

# Deploy cluster
ionet cluster create \
  --name my-first-cluster \
  --gpu h100-sxm \
  --count 1

# Wait ~90 seconds for provisioning
# Output shows cluster ID and connection info

Time: 3 minutes total

Step 5: Access Your GPU Cluster

Method A: SSH Access

# Get SSH command from dashboard or:
ionet cluster ssh my-first-cluster

# You're now connected to GPU server
# Verify GPUs available:
nvidia-smi

# Output shows:
# GPU 0: NVIDIA H100 80GB
# Memory: 80GB / 80GB

Method B: Jupyter Notebook

# Launch Jupyter on cluster
ionet cluster jupyter my-first-cluster

# Opens browser with Jupyter interface
# Start coding immediately

Method C: Kubernetes (Advanced)

# Get kubeconfig
ionet cluster kubeconfig my-first-cluster > ~/.kube/ionet-config

# Deploy via kubectl
kubectl apply -f my-training-job.yaml

Step 6: Run Your First Training Job

Example: Fine-Tune LLaMA 2 7B

6a. Prepare Training Script

# train.py
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, Trainer

# Load model
model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-7b-hf")
model = model.cuda()  # Move to GPU

# Load dataset
# ... (your data loading code)

# Train
trainer = Trainer(model=model, ...)
trainer.train()

# Save
model.save_pretrained("./fine-tuned-llama")

6b. Run Training

# SSH into cluster
ionet cluster ssh my-first-cluster

# Install dependencies
pip install transformers torch

# Run training
python train.py

# Monitor GPU utilization
nvidia-smi dmon -s u

6c. Monitor Progress

# In separate terminal, check costs
ionet billing summary

# Shows:
# Current spend: $12.50 (3.1 hours × $4/hr)
# Projected monthly: $2,880 (if running 24/7)

Step 7: Retrieve Your Trained Model

Option 1: Direct Download via SSH

# On your local machine
scp -r ionet-cluster:/path/to/fine-tuned-llama ./

Option 2: Upload to S3 from Cluster

# On cluster
pip install boto3
python upload_to_s3.py  # Your upload script

# No egress fees with io.net (unlike AWS)

Option 3: Keep on Cluster Storage

# Models persist on cluster storage
# Access them next time you scale up cluster

Step 8: Shut Down to Stop Charges

IMPORTANT: GPU clusters charge per hour while running

Method A: Delete Cluster

ionet cluster delete my-first-cluster

# Charges stop immediately
# Data is saved and can be restored

Method B: Scale to Zero

ionet cluster scale my-first-cluster --count 0

# $0/hr charges while scaled to zero
# Scale back up instantly when needed

Cost impact:

  • Forgot to shut down: $4/hr × 720 hrs/month = $2,880 wasted
  • Remember to shut down: $4/hr × 20 hrs actual use = $80

Common Beginner Mistakes (and How to Avoid)

Mistake 1: Forgetting to Shut Down

Symptom: Unexpected $2,880 monthly bill

Solution: Set budget alerts

ionet budget set --limit 100 --alert-at 80%
# Email alert at $80 spend

Mistake 2: Choosing Wrong GPU Type

Symptom: Fine-tuning 7B model on H100 ($4/hr) when RTX 4090 ($1/hr) would work

Solution: Start small, upgrade if needed

Mistake 3: Not Using Mixed Precision

Symptom: Training takes 20 hours instead of 10 hours (2x cost)

Solution:

from torch.cuda.amp import autocast
with autocast():
    output = model(input)

Impact: 2x faster = 50% cost savings

Mistake 4: Downloading Large Datasets Repeatedly

Symptom: Wasting GPU time downloading 100GB dataset every training run

Solution: Keep datasets on cluster storage or use S3 directly

Mistake 5: Over-Provisioning GPUs

Symptom: Using 8 GPUs when 4 would suffice (2x cost for minimal speedup)

Solution: Benchmark scaling efficiency before large deployments

Cost Optimization Tips

Tip 1: Use Cheapest Sufficient GPU

RTX 4090 ($1/hr) often sufficient for fine-tuning vs H100 ($4/hr) = 75% savings

Tip 2: Enable FP16/BF16 Mixed Precision

2x faster training = 50% cost reduction

Tip 3: Batch Multiple Experiments

Run 5 experiments back-to-back on same cluster vs spinning up/down 5 times = save setup overhead

Tip 4: Use Spot/Preemptible for Fault-Tolerant Jobs

AWS spot H100 is $45-60/hr (vs io.net $4/hr standard). io.net standard cheaper than AWS spot!

Tip 5: Scale to Zero Between Experiments

Pay only during active training, not during result analysis

Troubleshooting

Issue: Out of Memory Error

Solution: Reduce batch size or use gradient accumulation

# Reduce batch size
train_loader = DataLoader(dataset, batch_size=16)  # Was 32

# Or use gradient accumulation
for i, batch in enumerate(dataloader):
    loss.backward()
    if (i + 1) % 4 == 0:  # Accumulate 4 steps
        optimizer.step()

Issue: Slow Training

Diagnosis: Check GPU utilization

nvidia-smi dmon -s u
# If <70%, you have bottleneck

Common causes:

  • Data loading too slow → increase num_workers
  • Small batch size → increase batch size
  • CPU preprocessing → move to GPU

Issue: Connection Lost

Solution: Training continues in background. Reconnect via SSH

# Use tmux or screen to persist sessions
tmux new -s training
python train.py
# Detach: Ctrl+b, then d
# Reattach later: tmux attach -t training

Next Steps After First GPU Rental

Beginner → Intermediate

Learn:

  • Distributed training across multiple GPUs
  • Container-based deployment (Docker)
  • Checkpoint management and resume training
  • Experiment tracking (Weights & Biases, MLflow)

Practice:

  • Train progressively larger models (7B → 13B → 70B)
  • Optimize training efficiency (FP16, gradient accumulation, Flash Attention)
  • Benchmark GPU scaling (1 vs 4 vs 8 GPUs)

Intermediate → Advanced

Learn:

  • Multi-node distributed training (64+ GPUs)
  • Custom CUDA kernels for optimization
  • Production deployment pipelines
  • Cost-optimized hybrid cloud architectures

Practice:

  • Train foundation models from scratch
  • Implement efficient inference systems
  • Build MLOps automation (CI/CD for models)

Conclusion

Renting GPUs in the cloud is straightforward with the right provider. io.net offers the simplest onboarding (5 minutes to running GPUs), cheapest pricing (70% less than AWS), and instant availability—ideal for first-time renters.

Quick start checklist:

  1. Sign up at cloud.io.net (2 min)
  2. Deploy H100 cluster (1 min)
  3. Run training job (your code)
  4. Shut down when done (critical!)

Total time: <10 minutes from zero to training on H100 GPUs

Ready to rent your first cloud GPU?

Quickstart guide - Video tutorial
Community Discord - Get help from 5K+ users