FAQ: How Much Does an H100 Cost Per Hour?

Quick Answer

The NVIDIA H100 GPU costs $1.49-$2.20 per hour on io.net, depending on whether you choose the PCIe ($1.49/hr) or SXM ($2.20/hr) variant. This represents 70% savings compared to AWS ($4.99-$6.98/hr), Azure ($5.40-$7.20/hr), and CoreWeave ($3.39-$4.76/hr). The H100 is NVIDIA's flagship data center GPU, offering 3x faster training than the A100 and 9x faster inference through its Transformer Engine. For large-scale LLM training, multi-GPU H100 clusters on io.net cost $17.60/hr for 8 GPUs vs. $55.84/hr on AWS - a savings of $38.24/hr or $27,533/month.

H100 Pricing Across All Major Providers

Here's what you'll pay to rent NVIDIA H100 GPUs across the cloud GPU market:

H100 SXM5 (700W TDP, NVLink, Multi-GPU Training)

Provider	Price/Hour	Availability	Instance Type	Monthly Cost (24/7)
io.net	$2.20	Good	On-demand	$1,584
AWS	$6.98	Limited	p5.48xlarge (8x H100)	$5,026
Azure	$7.20	Limited	ND H100 v5	$5,184
CoreWeave	$4.76	Good	HGX H100	$3,427
Lambda Labs	Sold out	N/A	—	—
GCP	$6.85	Preview	a3-highgpu-8g	$4,932

io.net saves you 68-70% vs. hyperscalers, 54% vs. CoreWeave

H100 PCIe (350W TDP, Single GPU Inference)

Provider	Price/Hour	Availability	Instance Type	Monthly Cost (24/7)
io.net	$1.49	Good	On-demand	$1,073
AWS	$4.99	Limited	p5.2xlarge (single GPU)	$3,593
Azure	$5.40	Limited	ND H100 PCIe	$3,888
CoreWeave	$3.39	Good	H100 PCIe	$2,441
Lambda Labs	Sold out	N/A	—	—

io.net saves you 70% vs. hyperscalers, 56% vs. CoreWeave

Multi-GPU H100 Cluster Pricing (8x H100 SXM)

Provider	Price/Hour (8 GPUs)	Monthly Cost (24/7)	Annual Cost	Savings vs io.net
io.net	$17.60	$12,672	$152,064	Baseline
AWS	$55.84	$40,205	$482,458	+217%
Azure	$57.60	$41,472	$497,664	+227%
CoreWeave	$38.08	$27,418	$329,011	+116%

For teams training Llama 3 70B or GPT-class models, io.net saves $27,533/month vs. AWS

H100 SXM vs PCIe: Which Should You Choose?

NVIDIA offers two H100 variants with different use cases and pricing:

H100 SXM5 - $2.20/hr on io.net

Best for:
- Large-scale LLM training (70B+ parameter models)
- Multi-GPU distributed training
- Maximum training throughput
- Research requiring fastest iteration

Specifications:
- 700W TDP (higher power = faster performance)
- 80GB HBM3 memory @ 3.35 TB/s bandwidth
- NVLink 4.0: 900 GB/s GPU-to-GPU
- Designed for 8-GPU HGX baseboard configurations
- 60 TFLOPs FP64, 2000 TFLOPs FP8 (with sparsity)

Performance:
- 3x faster training vs. A100
- Llama 3 70B full fine-tuning: 48 hours (vs. 144 hours on A100)
- Stable Diffusion XL training: 6 hours (vs. 18 hours on A100)

When to use: Multi-GPU training clusters where speed matters more than cost. Training runs that would take weeks on A100.

H100 PCIe - $1.49/hr on io.net

Best for:
- High-throughput LLM inference
- Single-GPU training (<13B params)
- Evaluation and fine-tuning experiments
- Cost-sensitive production inference

Specifications:
- 350W TDP (50% lower power, cooler operation)
- 80GB HBM3 memory @ 2.0 TB/s bandwidth
- PCIe Gen5 x16 interface
- Optimized for single-GPU deployments
- 51 TFLOPs FP64, 1600 TFLOPs FP8 (with sparsity)

Performance:
- 2.5x faster training vs. A100 PCIe
- 9x faster inference vs. A100 (with FP8 Transformer Engine)
- Llama 3 8B inference: ~150 tokens/sec (vs. 60 tokens/sec on A100)

When to use: Inference APIs, single-GPU experiments, or when budgets are tight. Still faster than A100 at lower cost.

Decision guide:
- Training >40B params or need 8+ GPU clusters? → H100 SXM ($2.20/hr)
- Inference or single-GPU training? → H100 PCIe ($1.49/hr)
- Tight budget but need modern GPU? → A100 80GB ($1.49/hr) or RTX 4090 ($0.18/hr)

Real-World H100 Cost Scenarios

Scenario 1: Training Llama 3 70B from Scratch

Workload: Full pre-training on 1.5T tokens
- Hardware needed: 8x H100 SXM with NVLink
- Training time: ~720 hours (30 days)
- Optimization: BF16 mixed precision, FSDP

Provider	Cost
io.net	$2.20/hr × 8 GPUs × 720 hrs = $12,672
AWS	$6.98/hr × 8 GPUs × 720 hrs = $40,205
CoreWeave	$4.76/hr × 8 GPUs × 720 hrs = $27,418

io.net saves $27,533 vs. AWS or $14,746 vs. CoreWeave

Scenario 2: Fine-Tuning Llama 3 70B (LoRA)

Workload: LoRA fine-tuning on 10K custom examples
- Hardware needed: 4x H100 SXM
- Training time: 12 hours
- Optimization: LoRA rank 64, BF16

Provider	Cost
io.net	$2.20/hr × 4 GPUs × 12 hrs = $105.60
AWS	$6.98/hr × 4 GPUs × 12 hrs = $334.08
CoreWeave	$4.76/hr × 4 GPUs × 12 hrs = $228.48

io.net saves $228 vs. AWS or $123 vs. CoreWeave per experiment

Scenario 3: Production LLM Inference API

Workload: Serve 10M requests/day with Llama 3 70B
- Hardware needed: 3x H100 PCIe with vLLM
- Uptime: 24/7
- Optimization: Continuous batching, FP8 quantization

Provider	Monthly Cost	Cost per 1M tokens
io.net	$1.49/hr × 3 × 720 hrs = $3,218	$0.11
AWS	$4.99/hr × 3 × 720 hrs = $10,778	$0.37
CoreWeave	$3.39/hr × 3 × 720 hrs = $7,322	$0.25
OpenAI API	—	$0.60 per 1M tokens

io.net saves $7,560/month vs. AWS and costs 82% less than OpenAI API

Scenario 4: Research Lab - Daily Experimentation

Workload: Run 3-5 training experiments daily
- Hardware needed: 2x H100 PCIe
- Runtime: 6 hours/day average
- Use case: Architecture search, hyperparameter tuning

Provider	Monthly Cost	Annual Cost
io.net	$1.49/hr × 2 × 6 hrs × 30 = $536	$6,432
AWS	$4.99/hr × 2 × 6 hrs × 30 = $1,796	$21,552
CoreWeave	$3.39/hr × 2 × 6 hrs × 30 = $1,220	$14,640

io.net saves $1,260/month or $15,120/year vs. AWS

Why is io.net's H100 Pricing 70% Lower?

H100s are the most expensive GPUs to purchase ($25K-$40K each), yet io.net rents them for a fraction of competitor prices. Here's how:

1. Decentralized Supply Eliminates Data Center Costs

Traditional cloud approach:
- Purchase H100s at $30K-$40K each
- Build $500M data center with specialized cooling for 700W GPUs
- Install expensive high-speed networking (InfiniBand, NVLink switches)
- Mark up 300-500% to cover infrastructure TCO

io.net approach:
- Aggregate H100s from enterprises with spare capacity (AI labs, research institutions, crypto miners pivoting to AI)
- No data center construction - providers supply cooling and power
- Marketplace pricing: providers earn more than local rental, users pay less than cloud
- Platform fee: 10-20% vs. 300-500% traditional markup

Result: 70% cost savings passed directly to users

2. High-Utilization Economics

H100s on io.net average 75-85% utilization vs. 40-60% on traditional clouds (enterprises overprovision for peak capacity). Higher utilization means providers can charge less per hour while earning more total revenue.

Math: Provider earning $1.80/hr at 80% utilization makes $1,036/month. AWS earning $6.98/hr at 50% utilization makes $2,512/month but passes cost to users.

3. Global Arbitrage

io.net sources H100s globally, optimizing for electricity costs:
- Quebec, Canada: $0.05/kWh (hydro)
- Iceland: $0.06/kWh (geothermal)
- Norway: $0.08/kWh (hydro)

AWS concentrates H100s in us-east-1 ($0.15-$0.20/kWh). For a 700W H100 running 24/7, electricity alone costs $75-100/month on AWS vs. $25-45 on io.net.

4. No Enterprise Overhead

AWS pricing includes:
- Enterprise sales teams and account managers
- Premium support tiers
- Marketing and customer acquisition costs
- Shareholder profit margins

io.net is self-serve with community support, eliminating 40-60% of traditional cloud overhead.

H100 Performance Benchmarks

Here's what you get for your money:

Training Performance (Llama 3 8B, 10K steps)

GPU	Time	Cost (io.net)	Cost (AWS)	Throughput
H100 SXM	2.3 hrs	$4.84	$16.05	100%
H100 PCIe	2.8 hrs	$4.17	$13.97	82%
A100 80GB	6.5 hrs	$9.75	$26.65	35%
RTX 4090	8.2 hrs	$1.48	N/A	28%

Insight: H100 PCIe offers best price/performance for single-GPU training

Inference Performance (Llama 3 70B, vLLM, batch=8)

GPU	Tokens/sec	Cost per 1M tokens (io.net)	Cost per 1M tokens (AWS)
H100 SXM	185	$0.033	$0.105
H100 PCIe	152	$0.027	$0.091
A100 80GB	62	$0.054	$0.146
L40S	98	$0.021	$0.043

Insight: L40S ($0.75/hr) offers better cost/token for inference than H100

Multi-GPU Scaling (Llama 3 70B Training)

Configuration	Training Time	Cost (io.net)	Scaling Efficiency
1x H100 SXM	240 hrs	$528	100%
2x H100 SXM	125 hrs	$550	96%
4x H100 SXM	65 hrs	$572	92%
8x H100 SXM	35 hrs	$616	86%

Insight: Near-linear scaling up to 8 GPUs thanks to NVLink

How to Maximize Value from H100 Rentals

1. Use H100 PCIe for Inference

For LLM inference APIs, H100 PCIe ($1.49/hr) delivers 82% of SXM performance at 68% of the cost. Combined with vLLM and FP8 quantization, you'll achieve sub-$0.03 per 1M tokens - 95% cheaper than OpenAI API.

2. Optimize Training with Mixed Precision

H100's FP8 Transformer Engine accelerates training by 2x over FP16:
- Enable FP8 in HuggingFace Transformers (torch_dtype=torch.float8_e4m3fn)
- Use BF16 for non-Transformer layers
- Result: 2x speedup = 50% cost reduction

3. Batch Aggressively for Inference

H100's 80GB memory enables massive batch sizes:
- Llama 3 8B: batch size 128+ (vs. 32 on A100)
- 4x higher throughput per GPU
- 75% cost reduction per request

4. Use Spot-Like Pricing on io.net

While io.net doesn't offer "spot instances" (all instances are stable), prices during off-peak hours (2am-8am UTC) are sometimes 10-15% lower due to marketplace dynamics. Schedule batch training jobs overnight for additional savings.

5. Right-Size Your GPU Choice

Don't overpay for H100 if you don't need it:
- Fine-tuning <13B params? RTX 4090 ($0.18/hr) is 92% cheaper
- Inference for <33B params? L40S ($0.75/hr) offers better value
- Training 70B+ or need 8+ GPU clusters? H100 SXM is worth it

Can I rent a single H100 or do I need to rent 8?

You can rent as few as 1 H100 on io.net. While H100 SXM GPUs are designed for 8-GPU HGX baseboards, io.net's marketplace includes individual H100s for single-GPU workloads. For multi-GPU training, deploy 2, 4, or 8 H100s with NVLink connectivity. No minimums, pay per second.

How does H100 compare to H200?

The H200 (released Q1 2024) offers 141GB HBM3e memory vs. H100's 80GB - ideal for training 175B+ parameter models. Training performance is similar (+5-10% from memory bandwidth improvements). H200s are not yet widely available on cloud platforms. When available on io.net, expect $2.50-$3.00/hr pricing (still 60% below hyperscaler rates).

Is the H100 worth it vs A100 for inference?

For inference, H100 PCIe ($1.49/hr) is 9x faster than A100 ($1.20/hr) thanks to FP8 Transformer Engine. Cost per token is 40-50% lower on H100 despite higher hourly rate. For maximum cost efficiency, L40S ($0.75/hr) offers best $/token for models up to 70B params.

Can I run H100s on io.net with InfiniBand?

Yes. Multi-GPU H100 SXM clusters on io.net include NVLink 4.0 (900 GB/s GPU-to-GPU) and select providers offer InfiniBand networking for distributed training across nodes. For 8+ GPU clusters, specify InfiniBand requirement when deploying.

How long until H100s become cheaper?

NVIDIA's H200 and GB200 (Blackwell) releases will create downward price pressure on H100s in 2026-2027. Expect 20-30% price reductions as newer GPUs enter market. However, io.net's decentralized model already prices H100s 70% below hyperscalers, so absolute prices may not drop significantly - competitive gap will narrow as AWS/Azure reduce rates.

Get Started with H100 GPUs at 70% Savings

Stop overpaying for the world's fastest AI training GPUs:

✅ H100 SXM for $2.20/hr (vs. $6.98/hr on AWS) - 68% savings
✅ H100 PCIe for $1.49/hr (vs. $4.99/hr on AWS) - 70% savings
✅ Instant availability - no waitlists or reservations
✅ Multi-GPU clusters - scale from 1 to 100+ GPUs

Deploy H100 cluster now → | Compare all GPU pricing →

Pricing updated April 2026 | Benchmarks from internal testing and MLPerf results