The NVIDIA H100 remains the workhorse GPU for serious AI workloads in 2026. Whether you are training a large language model from scratch, fine-tuning an open-weight foundation model, or running high-throughput inference at scale, the H100's 80 GB of HBM3 memory and dedicated Transformer Engine make it the default choice for production ML teams.
But the H100 cloud GPU rental market has changed dramatically. Prices have dropped 60-75% from their 2024 peaks, new providers have entered the space, and availability constraints have eased considerably. An H100 that cost $8-$10/hr on-demand two years ago now starts at roughly $2.10/hr from the most competitive providers.
This guide breaks down exactly where to rent H100 GPUs in 2026, what each provider charges, and how to choose the right one for your workload. We cover six major providers, compare SXM and PCIe variants, estimate real-world costs for common workloads, and give you a decision framework that goes beyond sticker price.
If you are an ML engineer, infrastructure lead, or startup CTO evaluating GPU cloud options, this is the reference you need.
H100 Specifications Quick Reference
Before comparing pricing, it helps to understand what you are renting. The H100 ships in two form factors, and the differences matter for multi-GPU training workloads.
H100 SXM5 vs. H100 PCIe: Key Differences
| Specification | H100 SXM5 | H100 PCIe |
|---|---|---|
| GPU Memory | 80 GB HBM3 | 80 GB HBM3 |
| Memory Bandwidth | 3.35 TB/s | 2.0 TB/s |
| FP8 Tensor Core | 3,958 TFLOPS | 2,000 TFLOPS |
| FP16 Tensor Core | 1,979 TFLOPS | 1,513 TFLOPS |
| FP32 | 67 TFLOPS | 51 TFLOPS |
| TDP | 700W | 350W |
| Interconnect | NVLink 4.0 (900 GB/s) | PCIe Gen5 (128 GB/s) |
| Multi-GPU Scaling | Excellent (NVSwitch) | Limited |
| Best For | Distributed training, large clusters | Inference, single-GPU fine-tuning |
The bottom line: SXM5 is the variant you want for multi-GPU training. Its NVLink interconnect delivers 7x the bandwidth of PCIe Gen5, which translates directly into faster gradient synchronization across GPUs. For single-GPU inference or small fine-tuning jobs, PCIe offers the same 80 GB VRAM at a lower price point.
What Makes the H100 Different from A100
The H100 is built on NVIDIA's Hopper architecture and delivers roughly 3x the AI training performance of the A100 (Ampere). Three features drive this:
- Transformer Engine -- Automatically switches between FP8 and FP16 precision during transformer layer computation, nearly doubling throughput for LLM workloads without accuracy loss.
- HBM3 Memory -- 3.35 TB/s bandwidth on SXM (vs. 2.0 TB/s on A100 SXM), reducing memory-bound bottlenecks during large-batch training.
- Fourth-Generation NVLink -- 900 GB/s per GPU (vs. 600 GB/s on A100), enabling tighter multi-GPU scaling for distributed training runs.
Complete H100 Cloud Pricing Comparison (April 2026)
Here is what you will actually pay across six major providers. All rates are per-GPU, per-hour, on-demand pricing unless noted otherwise.
| Provider | H100 Variant | $/GPU/Hour | 8-GPU Cluster (Monthly)* | Availability | Min. Commitment |
|---|---|---|---|---|---|
| io.net | SXM | $2.10 - $3.50 | $12,096 - $20,160 | Instant (< 2 min deploy) | None |
| RunPod (Community) | PCIe | $1.99 | $11,462 | Good | None |
| RunPod (Community) | SXM | $2.69 | $15,494 | Good | None |
| RunPod (Secure) | SXM | ~$3.50 | ~$20,160 | Good | None |
| Lambda Labs | SXM | $2.99 | $17,222 | Moderate | None |
| CoreWeave | SXM (HGX) | $6.16 | $35,481 | By request | Reserved preferred |
| AWS EC2 P5 | SXM | ~$3.90 - $6.88 | $22,464 - $39,629 | Capacity Blocks | Varies |
| Google Cloud A3 | SXM | ~$9.80 - $14.19 | $56,448 - $81,734 | On-demand | None |
*Monthly estimate = hourly rate x 8 GPUs x 720 hours. Actual costs will be lower if you are not running 24/7.
Key takeaways from the pricing table:
- io.net offers the lowest SXM pricing in the market, starting at $2.10/hr -- roughly 70% less than Google Cloud and 46% less than AWS.
- RunPod's community cloud PCIe at $1.99/hr is the cheapest entry point, but PCIe limits multi-GPU training performance.
- The hyperscaler premium is real. AWS and Google Cloud charge 2-5x more than specialized GPU cloud providers for the same silicon.
- CoreWeave sits in between -- more expensive than indie providers but still well below hyperscaler pricing.

Provider-by-Provider Breakdown
io.net -- Best Value for H100 Clusters
Pricing: $2.10 - $3.50/hr per H100 SXM GPU
io.net operates a decentralized GPU cloud (DePIN) that aggregates over 320,000 GPUs from data centers across 130+ countries. This distributed supply model is what enables io.net's pricing advantage: rather than a single company bearing the full capital expenditure of GPU procurement, the network incentivizes independent operators to contribute hardware and earn token-based compensation.
Why teams choose io.net:
- Instant availability. H100 clusters deploy in under 2 minutes. No waitlists, no sales calls, no capacity block reservations. You select your GPU count and your cluster is live.
- Flexible deployment options. Support for Ray, Kubernetes, containers, VMs, and bare metal. You are not locked into a single orchestration paradigm.
- io.intelligence. Access to 25+ open-source and proprietary models through an OpenAI-compatible API. If you need inference alongside training, it is available on the same platform.
- Scale without friction. Need 8 GPUs today and 64 tomorrow? The decentralized supply pool scales dynamically without procurement lead times.
- Cost savings of 50-70% compared to AWS and Google Cloud on equivalent H100 hardware.
Best for: Teams that need on-demand H100 clusters at the lowest cost, startups optimizing burn rate, researchers who need flexible scaling, and any workload where time-to-deploy matters.
Considerations: As a decentralized network, hardware is sourced from verified operators rather than a single hyperscaler data center. io.net's verification layer ensures GPU authenticity and performance consistency.
[IMAGE: io.net cloud GPU deployment interface showing H100 cluster configuration]
RunPod -- Serverless + On-Demand Flexibility
Pricing: $1.99/hr (PCIe, Community) | $2.69/hr (SXM, Community) | ~$3.50/hr (SXM, Secure)
RunPod has carved out a strong position as a developer-friendly GPU cloud with both on-demand instances and a serverless inference platform. They offer two tiers: Community Cloud (multi-tenant, lower cost) and Secure Cloud (single-tenant, higher reliability).
Why teams choose RunPod:
- Serverless GPU inference. Pay-per-request pricing for inference workloads -- no idle GPU costs.
- Community Cloud pricing is genuinely competitive, especially for PCIe workloads.
- Good developer experience. Templates, one-click deployments, and an active community.
- Commitment discounts. 6-month commitments bring H100 SXM down to $2.60/hr; 12-month to $2.54/hr.
Best for: Individual developers and small teams running inference workloads, fine-tuning experiments, or projects that benefit from serverless scaling.
Considerations: Community Cloud instances are multi-tenant and may have variable availability. For production training workloads requiring guaranteed uptime, Secure Cloud pricing is closer to $3.50/hr. PCIe instances are cheaper but lack NVLink for multi-GPU scaling.
Lambda Labs -- Simple Pricing, ML-Native
Pricing: $2.99/hr per H100 SXM GPU (8x cluster)
Lambda Labs is a well-known name in the ML community, offering a straightforward cloud with clean pricing and a focus on machine learning workflows. Their 8xH100 instances run on dedicated hardware with InfiniBand networking.
Why teams choose Lambda:
- Transparent pricing. One price, no tiers, no hidden costs.
- ML-first tooling. Pre-built environments with PyTorch, TensorFlow, JAX, and common ML libraries.
- Strong reputation in the academic and research community.
- Persistent storage available across sessions.
Best for: Research teams and academic labs that want a simple, no-surprises GPU cloud experience.
Considerations: Availability can be limited -- Lambda's supply is smaller than hyperscalers or decentralized networks. Geographic coverage is primarily US-based. No spot or preemptible pricing tier.
CoreWeave -- Enterprise-Grade, NVIDIA-Backed
Pricing: ~$6.16/hr per H100 SXM GPU (HGX 8x nodes)
CoreWeave is an NVIDIA-backed GPU cloud provider that targets enterprise customers with large-scale, long-term compute needs. They focus on providing bare-metal-like performance with managed Kubernetes orchestration.
Why teams choose CoreWeave:
- NVIDIA partnership ensures early access to newest hardware.
- InfiniBand networking across H100 clusters for optimal multi-node training.
- Kubernetes-native. Purpose-built for containerized ML pipelines.
- Enterprise SLAs and dedicated support.
Best for: Enterprise teams with large, sustained GPU needs who can commit to reserved capacity and need provider-level SLAs.
Considerations: On-demand pricing is significantly higher than indie GPU clouds. CoreWeave is optimized for long-term reserved contracts rather than on-demand usage. Not the right fit for teams with variable or unpredictable workloads.
AWS EC2 P5 -- The Hyperscaler Default
Pricing: ~$3.90 - $6.88/hr per H100 GPU (varies by region and pricing model)
AWS offers H100 GPUs through P5 instances (p5.48xlarge with 8x H100 SXM GPUs). As the dominant cloud provider, AWS benefits from ecosystem integration -- if your data, pipelines, and infrastructure already live on AWS, P5 instances plug in without migration.
Why teams choose AWS:
- Ecosystem integration. SageMaker, S3, EKS, and the rest of the AWS stack.
- Capacity Blocks. Reserve GPU capacity for defined time windows, reducing on-demand costs.
- Spot instances. H100 spot pricing can drop to ~$2.50/GPU/hr when available.
- Global availability across multiple regions.
- Savings Plans. 1-year and 3-year commitments reduce costs by 25-40%.
Best for: Teams already invested in the AWS ecosystem, enterprises with existing AWS enterprise agreements, and workloads that require tight integration with AWS services.
Considerations: On-demand P5 pricing is 2-3x higher than specialized GPU clouds. Spot instance availability is unpredictable. AWS's pricing complexity (on-demand, spot, Capacity Blocks, Savings Plans, Reserved Instances) adds operational overhead. Deployment and configuration are more involved than purpose-built GPU cloud providers.
Google Cloud A3 -- Premium Pricing, TPU Alternative
Pricing: ~$9.80 - $14.19/hr per H100 GPU (A3 High, on-demand)
Google Cloud offers H100 GPUs through A3 accelerator-optimized machine types. GCP's H100 pricing is the highest among the providers in this guide, reflecting Google's positioning as a premium enterprise cloud.
Why teams choose GCP:
- Vertex AI integration. End-to-end ML platform with managed training and serving.
- TPU alternative path. Teams can benchmark H100 performance against TPU v5e for cost optimization.
- BigQuery and data pipeline integration for teams with data already in GCP.
- Committed use discounts. 1-year and 3-year commitments bring pricing down significantly.
Best for: Teams already committed to the GCP ecosystem, organizations using Vertex AI, and workloads that benefit from Google's data infrastructure.
Considerations: On-demand pricing is the most expensive in this comparison. Spot/preemptible discounts are available but with the risk of preemption during training runs. The cost gap between GCP and specialized GPU clouds is substantial -- a team could run the same workload on io.net for 70-80% less.
H100 vs. A100: When Do You Actually Need an H100?
Not every workload requires an H100. A100 GPUs are still widely available at lower price points ($1.00-$2.50/hr depending on provider), and for certain workloads, they deliver sufficient performance at meaningfully lower cost.
When the H100 Is Worth the Premium
| Workload | H100 Advantage | Recommendation |
|---|---|---|
| Training LLMs (7B+ parameters) | 2-3x faster training via Transformer Engine + HBM3 bandwidth | H100 SXM strongly recommended |
| Multi-node distributed training | NVLink 4.0 at 900 GB/s dramatically reduces communication overhead | H100 SXM required for efficiency |
| FP8 inference at scale | Native FP8 support in Transformer Engine, ~2x throughput vs. A100 | H100 (SXM or PCIe) |
| Serving large models (70B+) | 80 GB HBM3 with higher bandwidth reduces latency | H100 recommended |
When the A100 Is Sufficient
| Workload | Why A100 Works | Recommendation |
|---|---|---|
| Fine-tuning models < 13B | Fits in 80 GB VRAM; training time difference is tolerable | A100 80GB |
| Inference for smaller models | Batch sizes fit in memory; latency requirements are modest | A100 40/80GB |
| Prototyping and experimentation | Cost matters more than speed during exploration | A100 or even L40S |
| Data preprocessing / ETL | GPU-accelerated data pipelines are not bandwidth-bound | A100 40GB |
The rule of thumb: If your training run will take more than 24 hours on an A100, the H100's ~3x speedup likely makes it the more cost-effective choice. A 72-hour A100 job becomes a 24-hour H100 job -- and at current pricing, the H100 run may actually cost less in total compute-hours.
How to Choose an H100 Cloud Provider
Price per GPU-hour is the obvious comparison point, but it is rarely the only factor that matters. Here is a decision framework for evaluating providers based on what actually drives total cost and productivity.
Decision Framework
| Factor | Questions to Ask | Why It Matters |
|---|---|---|
| Effective hourly cost | What is the all-in cost including networking, storage, and egress? | Sticker price can be misleading if storage and egress add 20-30% |
| Availability | Can I get GPUs when I need them, or is there a waitlist? | A cheaper GPU you cannot access is not cheaper |
| Time to deploy | How long from "I need GPUs" to "my job is running"? | Idle engineer time has a cost. Minutes vs. days matters |
| Interconnect | NVLink, InfiniBand, or PCIe only? | Critical for multi-GPU training; irrelevant for single-GPU inference |
| Orchestration | Kubernetes, Ray, bare metal, or managed? | Match your team's existing workflow |
| Commitment required | On-demand, or must I reserve for months? | Startups and research teams need flexibility |
| Geographic coverage | Regions available? Data residency requirements? | Compliance and latency considerations |
| Ecosystem lock-in | How much of my pipeline depends on this provider? | Migration cost increases with integration depth |
Quick Recommendation by Use Case
| Use Case | Recommended Provider | Why |
|---|---|---|
| Cost-sensitive training | io.net | Lowest H100 SXM pricing, instant deploy, no commitment |
| Serverless inference | RunPod | Pay-per-request serverless tier, good developer tools |
| Academic research | Lambda Labs | Simple pricing, ML-native environment |
| Enterprise (large contracts) | CoreWeave | NVIDIA-backed, enterprise SLAs, InfiniBand |
| AWS-native teams | AWS P5 | Ecosystem integration, Savings Plans |
| GCP-native teams | Google Cloud A3 | Vertex AI, BigQuery integration |
| Budget-conscious startups | io.net | 50-70% savings vs. hyperscalers, scales with you |
Common H100 Workloads & Cost Estimates
Real costs depend on training duration, batch size, model architecture, and optimization. The following estimates use representative workloads to compare total job cost across providers.
LLM Training (7B Parameter Model, Full Pre-Training)
A full pre-training run on a 7B parameter model typically requires 8x H100 SXM GPUs for approximately 14 days.
| Provider | $/GPU/Hr | 8-GPU x 336 Hours | Total Cost |
|---|---|---|---|
| io.net | $2.10 | $5,645 | $5,645 |
| RunPod (SXM, Community) | $2.69 | $7,226 | $7,226 |
| Lambda Labs | $2.99 | $8,033 | $8,033 |
| CoreWeave | $6.16 | $16,558 | $16,558 |
| AWS P5 (on-demand) | $3.90 | $10,483 | $10,483 |
| Google Cloud A3 | $9.80 | $26,342 | $26,342 |
Savings with io.net vs. AWS: $4,838 (46% less)
Savings with io.net vs. Google Cloud: $20,697 (79% less)
Fine-Tuning (70B Model, LoRA/QLoRA)
Fine-tuning a 70B model with LoRA typically takes 4x H100 SXM GPUs for 48-72 hours.
| Provider | $/GPU/Hr | 4-GPU x 60 Hours | Total Cost |
|---|---|---|---|
| io.net | $2.10 | $504 | $504 |
| RunPod (SXM) | $2.69 | $646 | $646 |
| Lambda Labs | $2.99 | $718 | $718 |
| AWS P5 | $3.90 | $936 | $936 |
High-Throughput Inference (Serving a 70B Model)
Running a 70B model for production inference at ~500 requests/minute typically requires 2x H100 SXM GPUs running 24/7.
| Provider | $/GPU/Hr | 2-GPU x 720 Hours/Month | Monthly Cost |
|---|---|---|---|
| io.net | $2.10 | $3,024 | $3,024 |
| RunPod (SXM) | $2.69 | $3,874 | $3,874 |
| Lambda Labs | $2.99 | $4,306 | $4,306 |
| AWS P5 | $3.90 | $5,616 | $5,616 |
These estimates use the lowest published on-demand rates. Actual costs may vary based on storage, networking, and egress charges, which differ by provider.
Availability & Waitlists -- The Real Situation in 2026
The GPU shortage of 2023-2024 has eased considerably. Here is the actual availability picture as of mid-2026:
No waitlist:
- io.net -- Decentralized supply across 320,000+ GPUs means consistent availability. H100 clusters deploy in under 2 minutes.
- RunPod -- Community and Secure Cloud generally have H100 stock, though specific configurations may have short waits.
Generally available with some constraints:
- Lambda Labs -- Availability has improved but can still be limited during peak demand periods.
- AWS -- On-demand P5 availability varies by region. Capacity Blocks require advance reservation (1-14 days).
Reserved / contract-based:
- CoreWeave -- Best availability comes through reserved contracts. On-demand access is more limited.
- Google Cloud -- A3 instances are available on-demand in most regions, but committed use pricing requires contracts.
The key shift in 2026: The supply-demand imbalance that defined 2023-2024 has normalized. You no longer need to join waitlists or commit to year-long contracts to access H100 hardware. Providers like io.net, which aggregate supply from a global network of GPU operators, have further reduced availability constraints by distributing supply geographically.
That said, if you need 100+ GPUs simultaneously for a multi-node training run, advance planning is still wise. For clusters of 8-32 GPUs, instant availability is the norm on most providers.
Frequently Asked Questions
How much does it cost to rent an H100 GPU per hour?
H100 GPU rental prices in 2026 range from $2.10/hr to $14.19/hr depending on the provider and variant. io.net offers the lowest H100 SXM pricing at $2.10-$3.50/hr. RunPod starts at $1.99/hr for PCIe and $2.69/hr for SXM. Hyperscalers like AWS ($3.90-$6.88/hr) and Google Cloud ($9.80-$14.19/hr) charge significantly more. The market average sits around $3.00/hr for on-demand H100 access.
What is the difference between H100 SXM and H100 PCIe?
Both variants have 80 GB HBM3 memory, but SXM delivers 3.35 TB/s memory bandwidth (vs. 2.0 TB/s for PCIe) and supports NVLink 4.0 at 900 GB/s for multi-GPU communication. SXM is essential for distributed training workloads where GPUs need to synchronize gradients frequently. PCIe is sufficient for single-GPU inference and fine-tuning tasks.
Is the H100 worth it over the A100?
For workloads that take more than 24 hours on an A100, the H100's approximately 3x speedup typically makes it more cost-effective despite the higher hourly rate. A 72-hour A100 training run becomes ~24 hours on H100, often costing less total. For shorter jobs, prototyping, or inference with smaller models, the A100 at $1.00-$2.50/hr may be the better value.
Can I rent a single H100 GPU, or do I need a full cluster?
Most providers offer single-GPU instances. io.net, RunPod, and Lambda Labs all support configurations from 1 GPU up to multi-node clusters. For training workloads that benefit from multi-GPU parallelism, 8-GPU configurations with NVLink (SXM variant) provide the best performance per dollar.
How long does it take to deploy H100 GPUs in the cloud?
Deployment time varies significantly. io.net deploys H100 clusters in under 2 minutes thanks to its decentralized supply network. RunPod and Lambda Labs typically provision within 5-15 minutes. AWS Capacity Blocks require advance reservation of 1-14 days. CoreWeave reserved instances are provisioned through an enterprise onboarding process.
What is DePIN and how does it affect GPU pricing?
DePIN (Decentralized Physical Infrastructure Networks) is the model used by io.net, where GPU supply comes from a global network of independent operators rather than a single company owning all the hardware. This distributes capital expenditure risk across many participants, enabling lower pricing (50-70% less than hyperscalers), broader geographic availability (130+ countries), and elastic supply that scales without traditional procurement cycles.
How do I estimate my H100 cloud costs?
Calculate: (number of GPUs) x (hours per job) x (hourly rate). For training, estimate job duration based on your model size and dataset. For inference, calculate based on expected request volume and required throughput. Add 10-20% buffer for setup, debugging, and idle time between runs. Storage and egress fees vary by provider and should be factored separately.
Which H100 cloud provider is best for startups?
Startups optimizing burn rate should prioritize providers with no minimum commitment, instant availability, and competitive pricing. io.net checks all three boxes: H100 SXM from $2.10/hr, deploy in under 2 minutes, no contracts required. RunPod's community cloud is also competitive for smaller workloads. Avoid locking into hyperscaler reserved instances until your GPU usage patterns are predictable.
Conclusion
The H100 cloud GPU rental landscape in 2026 is more competitive than ever. Prices have dropped to roughly a third of what they were two years ago, availability constraints have largely dissolved, and the range of providers gives teams real options beyond the traditional hyperscalers.
The pricing gap between providers is significant and can translate to tens of thousands of dollars on a single training run. An 8-GPU, two-week training job on io.net costs roughly $5,645, compared to $10,483 on AWS and $26,342 on Google Cloud. For teams running multiple training cycles per month, the difference compounds quickly.
For most teams evaluating H100 cloud options in 2026, here is the practical guidance:
- Start with io.net if cost efficiency and instant availability are priorities. At $2.10-$3.50/hr for H100 SXM with sub-2-minute deployment, it offers the best combination of price and accessibility in the market.
- Consider RunPod if you need serverless inference alongside on-demand training.
- Choose Lambda Labs if you value simplicity and operate primarily in the US.
- Evaluate CoreWeave or a hyperscaler if you need enterprise SLAs, existing ecosystem integration, or are operating at a scale where reserved pricing negotiations become meaningful.
The right provider depends on your workload, budget, team size, and existing infrastructure. But the days of paying hyperscaler premiums because there was no alternative are over.
Ready to deploy H100 GPUs at the lowest cost in the market? Explore io.net Cloud -- H100 SXM clusters from $2.10/hr, live in under 2 minutes.