How to Rent a GPU in the Cloud: Step-by-Step Setup Tutorial

Renting GPUs in the cloud enables AI development without $300K capital expenditures on hardware. Whether you're training language models, fine-tuning computer vision systems, or running batch inference, cloud GPU rental provides instant access to H100, A100, and other high-performance hardware at hourly rates. This beginner-friendly tutorial walks through the complete process: choosing a provider, selecting GPU types, deploying your first cluster, running training jobs, and optimizing costs.

Why Rent Cloud GPUs Instead of Buying?

Capital efficiency: $0 upfront vs $300K for DGX H100

Instant scalability: 1 to 100+ GPUs in minutes

Latest hardware: Access H100, future H200 without purchase cycles

Elastic costs: Pay only when using, scale to zero when idle

Global deployment: Train in US, Europe, Asia without building data centers

No maintenance: Provider handles power, cooling, hardware failures

Step 1: Choose Your Cloud GPU Provider

Beginner Recommendation: io.net

Why io.net for first-time renters:

Simplest onboarding (5 minutes signup → running GPUs)
Cheapest pricing (70% less than AWS)
Instant availability (no waitlists or quota requests)
Pre-configured containers (no CUDA installation needed)
$100 free trial (test before committing)

Pricing: $4/hr for H100, $2.50/hr for A100, $1/hr for RTX 4090

Alternative Providers

AWS EC2: Most mature but 3x more expensive, complex setup
GCP Compute Engine: Competitive with AWS pricing, limited H100 availability
Azure Virtual Machines: Enterprise-friendly, similar pricing to AWS/GCP

For beginners: Start with io.net for lowest cost and fastest setup. Graduate to AWS/GCP if you need their managed services later.

Step 2: Select the Right GPU Type

GPU Selection Matrix

For fine-tuning models <7B parameters:

Best: RTX 4090 ($1/hr) - Sufficient performance, lowest cost
Good: A100 40GB ($1.80/hr) - More memory if needed

For training models 7-70B parameters:

Best: A100 80GB ($2.50/hr) - Sweet spot price/performance
Fast: H100 ($4/hr) - 3x faster, worth it for time-sensitive projects

For foundation models >70B parameters:

Required: H100 SXM ($4/hr) - Only viable option for reasonable training times

For inference:

Low volume: RTX 4090 ($1/hr)
High volume: H100 ($4/hr) for 3x throughput

How Many GPUs Do You Need?

Single GPU (1-8 GPUs):

Fine-tuning small models
Development and experimentation
Inference workloads

Multi-GPU cluster (8-64 GPUs):

Training medium-large models (7-70B params)
Distributed training for speed
High-throughput inference

Large cluster (64+ GPUs):

Foundation model training
Production training pipelines

3a. Create Account

Visit cloud.io.net
Click "Sign Up"
Enter email and password
Verify email (check inbox for confirmation link)

Time: 2 minutes

3b. Add Credits

Payment options:

Credit card: Visa, Mastercard, Amex (instant)
Cryptocurrency: USDC, USDT, ETH, BTC (5-15 min confirmation)
Free trial: $100 credits, no card required

Recommended: Start with free $100 trial to test platform

Time: 2-5 minutes depending on payment method

Step 4: Deploy Your First GPU Cluster

Method A: Web Dashboard (Easiest)

Step 4a-1: Navigate to cloud.io.net/clusters

Step 4a-2: Click "New Cluster"

Step 4a-3: Configure cluster

Name: my-first-cluster
GPU Type: H100 SXM (dropdown menu)
Quantity: 1 (start small to test)
Region: Auto (or select preferred location)

Step 4a-4: Click "Launch Cluster"

Result: Cluster provisions in <2 minutes. Dashboard shows:

Cluster status: Running
GPU utilization: 0% (waiting for workload)
Connection details: SSH command, kubectl config
Current cost: $4/hr

Time: 5 minutes total

Method B: Command Line (Faster for Repeat Use)

# Install io.net CLI
pip install ionet-cli

# Authenticate
ionet login
# Enter your email/password

# Deploy cluster
ionet cluster create \
  --name my-first-cluster \
  --gpu h100-sxm \
  --count 1

# Wait ~90 seconds for provisioning
# Output shows cluster ID and connection info

Time: 3 minutes total

Step 5: Access Your GPU Cluster

Method A: SSH Access

# Get SSH command from dashboard or:
ionet cluster ssh my-first-cluster

# You're now connected to GPU server
# Verify GPUs available:
nvidia-smi

# Output shows:
# GPU 0: NVIDIA H100 80GB
# Memory: 80GB / 80GB

Method B: Jupyter Notebook

# Launch Jupyter on cluster
ionet cluster jupyter my-first-cluster

# Opens browser with Jupyter interface
# Start coding immediately

Method C: Kubernetes (Advanced)

# Get kubeconfig
ionet cluster kubeconfig my-first-cluster > ~/.kube/ionet-config

# Deploy via kubectl
kubectl apply -f my-training-job.yaml

Step 6: Run Your First Training Job

Example: Fine-Tune LLaMA 2 7B

6a. Prepare Training Script

# train.py
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, Trainer

# Load model
model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-7b-hf")
model = model.cuda()  # Move to GPU

# Load dataset
# ... (your data loading code)

# Train
trainer = Trainer(model=model, ...)
trainer.train()

# Save
model.save_pretrained("./fine-tuned-llama")

6b. Run Training

# SSH into cluster
ionet cluster ssh my-first-cluster

# Install dependencies
pip install transformers torch

# Run training
python train.py

# Monitor GPU utilization
nvidia-smi dmon -s u

6c. Monitor Progress

# In separate terminal, check costs
ionet billing summary

# Shows:
# Current spend: $12.50 (3.1 hours × $4/hr)
# Projected monthly: $2,880 (if running 24/7)

Step 7: Retrieve Your Trained Model

Option 1: Direct Download via SSH

# On your local machine
scp -r ionet-cluster:/path/to/fine-tuned-llama ./

Option 2: Upload to S3 from Cluster

# On cluster
pip install boto3
python upload_to_s3.py  # Your upload script

# No egress fees with io.net (unlike AWS)

Option 3: Keep on Cluster Storage

# Models persist on cluster storage
# Access them next time you scale up cluster

Step 8: Shut Down to Stop Charges

IMPORTANT: GPU clusters charge per hour while running

Method A: Delete Cluster

ionet cluster delete my-first-cluster

# Charges stop immediately
# Data is saved and can be restored

Method B: Scale to Zero

ionet cluster scale my-first-cluster --count 0

# $0/hr charges while scaled to zero
# Scale back up instantly when needed

Cost impact:

Forgot to shut down: $4/hr × 720 hrs/month = $2,880 wasted
Remember to shut down: $4/hr × 20 hrs actual use = $80

Common Beginner Mistakes (and How to Avoid)

Mistake 1: Forgetting to Shut Down

Symptom: Unexpected $2,880 monthly bill

Solution: Set budget alerts

ionet budget set --limit 100 --alert-at 80%
# Email alert at $80 spend

Mistake 2: Choosing Wrong GPU Type

Symptom: Fine-tuning 7B model on H100 ($4/hr) when RTX 4090 ($1/hr) would work

Solution: Start small, upgrade if needed

Mistake 3: Not Using Mixed Precision

Symptom: Training takes 20 hours instead of 10 hours (2x cost)

Solution:

from torch.cuda.amp import autocast
with autocast():
    output = model(input)

Impact: 2x faster = 50% cost savings

Mistake 4: Downloading Large Datasets Repeatedly

Symptom: Wasting GPU time downloading 100GB dataset every training run

Solution: Keep datasets on cluster storage or use S3 directly

Mistake 5: Over-Provisioning GPUs

Symptom: Using 8 GPUs when 4 would suffice (2x cost for minimal speedup)

Solution: Benchmark scaling efficiency before large deployments

Cost Optimization Tips

Tip 1: Use Cheapest Sufficient GPU

RTX 4090 ($1/hr) often sufficient for fine-tuning vs H100 ($4/hr) = 75% savings

Tip 2: Enable FP16/BF16 Mixed Precision

2x faster training = 50% cost reduction

Tip 3: Batch Multiple Experiments

Run 5 experiments back-to-back on same cluster vs spinning up/down 5 times = save setup overhead

Tip 4: Use Spot/Preemptible for Fault-Tolerant Jobs

AWS spot H100 is $45-60/hr (vs io.net $4/hr standard). io.net standard cheaper than AWS spot!

Tip 5: Scale to Zero Between Experiments

Pay only during active training, not during result analysis

Troubleshooting

Issue: Out of Memory Error

Solution: Reduce batch size or use gradient accumulation

# Reduce batch size
train_loader = DataLoader(dataset, batch_size=16)  # Was 32

# Or use gradient accumulation
for i, batch in enumerate(dataloader):
    loss.backward()
    if (i + 1) % 4 == 0:  # Accumulate 4 steps
        optimizer.step()

Issue: Slow Training

Diagnosis: Check GPU utilization

nvidia-smi dmon -s u
# If <70%, you have bottleneck

Common causes:

Data loading too slow → increase num_workers
Small batch size → increase batch size
CPU preprocessing → move to GPU

Issue: Connection Lost

Solution: Training continues in background. Reconnect via SSH

# Use tmux or screen to persist sessions
tmux new -s training
python train.py
# Detach: Ctrl+b, then d
# Reattach later: tmux attach -t training

Next Steps After First GPU Rental

Beginner → Intermediate

Learn:

Distributed training across multiple GPUs
Container-based deployment (Docker)
Checkpoint management and resume training
Experiment tracking (Weights & Biases, MLflow)

Practice:

Train progressively larger models (7B → 13B → 70B)
Optimize training efficiency (FP16, gradient accumulation, Flash Attention)
Benchmark GPU scaling (1 vs 4 vs 8 GPUs)

Intermediate → Advanced