Two years after the H100 took the spotlight, the NVIDIA A100 remains the most-deployed data center GPU in the world. More AI models have been trained on A100s than any other accelerator in history, and in 2026 it is still the default choice for inference, fine-tuning, and a wide range of training workloads where raw cost-efficiency matters more than peak throughput.
The reason is straightforward: A100 cloud rental pricing has dropped 18% year-over-year while the hardware itself has only gotten more battle-tested. Every major ML framework, every CUDA library, and every deployment tool supports the A100 without qualification. There are no driver surprises, no compatibility edge cases, no "works on H100 but not on A100" footnotes. For teams that need reliable GPU compute without paying the Hopper premium, the A100 is still the workhorse.
This guide covers everything you need to rent A100 GPUs in the cloud: the hardware specs that matter for your workload, a complete pricing comparison across every major provider, workload-matched cost estimates, and guidance on when the A100 is actually the better choice over newer silicon.
A100 Specifications: SXM vs PCIe, 40GB vs 80GB
The A100 ships in four configurations. Understanding the differences determines which cloud instances to target and how much you should be paying.
Form Factors: SXM4 vs PCIe
A100 SXM4 is the high-performance variant designed for NVIDIA's HGX baseboards. It runs at up to 400W TDP, supports full NVLink interconnects at 600 GB/s GPU-to-GPU bandwidth, and delivers peak clock speeds across all precision formats. SXM4 is the configuration you want for multi-GPU training where GPUs need to communicate at high bandwidth.
A100 PCIe is the standard card form factor, running at 250-300W TDP. It fits into conventional server PCIe Gen4 slots, making it more flexible for existing infrastructure. PCIe models support NVLink bridge for dual-GPU setups but lack the full NVLink fabric of SXM4 systems. Performance is slightly lower due to power constraints.
Memory: 40GB vs 80GB
The 40GB variant uses HBM2 with 1,555 GB/s bandwidth. The 80GB variant uses HBM2e with up to 2,039 GB/s bandwidth. The 80GB model is the standard for cloud deployments in 2026 -- the extra VRAM and bandwidth make it suitable for larger batch sizes, longer context windows, and models above 13B parameters without quantization.
Key Specifications at a Glance
| Specification | A100 SXM4 80GB | A100 PCIe 80GB | A100 PCIe 40GB |
|---|---|---|---|
| VRAM | 80GB HBM2e | 80GB HBM2e | 40GB HBM2 |
| Memory Bandwidth | 2,039 GB/s | 2,039 GB/s | 1,555 GB/s |
| FP32 Performance | 19.5 TFLOPS | 19.5 TFLOPS | 19.5 TFLOPS |
| TF32 Tensor | 156 TFLOPS | 156 TFLOPS | 156 TFLOPS |
| FP16 Tensor | 312 TFLOPS | 312 TFLOPS | 312 TFLOPS |
| INT8 Tensor | 624 TOPS | 624 TOPS | 624 TOPS |
| GPU-to-GPU Interconnect | NVLink 600 GB/s | PCIe Gen4 (64 GB/s) | PCIe Gen4 (64 GB/s) |
| TDP | 400W | 300W | 250W |
| Multi-Instance GPU (MIG) | Up to 7 instances | Up to 7 instances | Up to 7 instances |
| Architecture | Ampere (GA100) | Ampere (GA100) | Ampere (GA100) |
The 80GB SXM4 is the gold standard for cloud rental. It delivers the highest memory bandwidth, full NVLink support for multi-GPU scaling, and the thermal headroom for sustained workloads. When you see "A100" in cloud provider listings without further qualification, it is usually this variant.
The 40GB PCIe variant still appears on some platforms at a lower price point. It is adequate for inference on models under 30B parameters and fine-tuning on models under 13B, but the VRAM ceiling limits its utility for larger workloads.
Complete A100 Cloud Rental Pricing (April 2026)
A100 pricing varies by a factor of 5x across providers. The table below reflects current on-demand rates across every major cloud platform. All prices are per GPU, per hour.
A100 SXM 80GB -- The Standard Configuration
| Provider | Category | On-Demand $/hr | Spot / Auction | Min. Billing | Notes |
|---|---|---|---|---|---|
| AWS (P4d) | Hyperscaler | ~$4.50 | ~$1.50 | 1 hour | 8-GPU instances only; per-GPU cost derived |
| CoreWeave | Specialized | $2.06 | N/A | 1 minute | Kubernetes-native; reserved discounts available |
| RunPod (Secure) | Specialized | ~$2.20 | $1.39 (Community) | 1 minute | Community cloud = shared infra, lower price |
| Lambda Labs | Specialized | $1.29 | N/A | 1 minute | Often waitlisted; 1-click notebook launch |
| Vast.ai | Marketplace | ~$0.90 | Variable | Per second | Peer marketplace; pricing fluctuates with supply |
| io.net | Decentralized | $1.20-$2.00 | Marketplace | Per minute | 320K+ GPUs, deploys in <2 min, no egress fees |
A100 PCIe 40GB -- Budget Option
| Provider | Category | On-Demand $/hr | Notes |
|---|---|---|---|
| RunPod | Specialized | ~$1.22 | Limited availability |
| Vast.ai | Marketplace | ~$0.50-$0.80 | Supply-dependent pricing |
| io.net | Decentralized | $0.80-$1.50 | Available across distributed network |
Key takeaway: A100 SXM 80GB on-demand pricing clusters into three tiers. Hyperscalers charge $3.50-$4.50/hr. Specialized GPU clouds run $1.29-$2.20/hr. Decentralized networks and marketplaces deliver $0.90-$2.00/hr. The tier you choose depends on how much you value managed services, SLAs, and deployment convenience versus raw cost savings.
Provider Breakdown: Who Offers What
io.net -- Decentralized GPU Network
A100 SXM 80GB: $1.20-$2.00/hr | A100 PCIe 40GB: $0.80-$1.50/hr
io.net aggregates over 320,000 GPUs across 130+ countries through its decentralized physical infrastructure network (DePIN). A100 clusters deploy in under two minutes. There are no egress fees, no storage markups, and no minimum commitments.
Strengths: Lowest total cost of ownership when accounting for zero egress and storage fees. Per-minute billing. Global GPU availability means less waitlisting than centralized providers. Multi-GPU clusters available with high-bandwidth interconnects.
Best for: Teams optimizing for cost on fine-tuning, batch inference, and training jobs where hyperscaler SLAs are not required. Startups and researchers who want A100 access without signing enterprise contracts.
RunPod -- Flexible GPU Cloud
A100 SXM 80GB: ~$2.20/hr (Secure) | $1.39/hr (Community Cloud)
RunPod offers two tiers: Secure Cloud (enterprise-grade data centers) and Community Cloud (distributed infrastructure at lower cost). The community tier is popular for development and experimentation. RunPod also offers serverless GPU endpoints for inference scaling.
Strengths: Developer-friendly interface. Serverless inference product. Large template library for quick deployment. Active community and documentation.
Best for: Developers who want quick deployment for experiments and prototyping. Teams needing serverless inference auto-scaling.
Lambda Labs -- ML-Focused Cloud
A100 SXM 80GB: ~$1.29/hr
Lambda is built specifically for machine learning workloads. Pricing is competitive and transparent. Their one-click notebook environments and pre-configured ML stacks reduce setup friction. The trade-off is availability -- A100 instances are frequently waitlisted during peak demand.
Strengths: Clean ML-focused experience. Competitive pricing. Pre-installed frameworks.
Best for: ML engineers who can plan around availability constraints and want a streamlined training environment.
CoreWeave -- GPU-Native Cloud
A100 SXM 80GB: ~$2.06/hr
CoreWeave is a specialized GPU cloud built on Kubernetes. They offer bare-metal performance with cloud flexibility. Pricing is higher than marketplace options but includes enterprise features: reserved instances, dedicated clusters, and InfiniBand networking for multi-node training.
Strengths: Enterprise-grade infrastructure. Kubernetes-native. InfiniBand for multi-node workloads. Strong SLAs.
Best for: Teams running production training pipelines that need predictable performance and enterprise support.
AWS (P4d Instances) -- The Enterprise Default
A100 SXM 80GB: ~$4.50/hr per GPU
AWS packages A100s in P4d instances with 8 GPUs per node. Per-GPU cost is derived from the instance price. AWS offers the broadest ecosystem: SageMaker integration, S3 storage, VPC networking, IAM security. Pricing is the highest in the comparison, but enterprise procurement teams default to AWS for compliance and integration reasons.
Strengths: Deepest ecosystem integration. Strongest compliance certifications. Global region availability. Spot instances at ~$1.50/hr when available.
Best for: Enterprises with existing AWS infrastructure, compliance requirements, and procurement processes that mandate hyperscaler vendors.
Vast.ai -- GPU Marketplace
A100 SXM 80GB: from ~$0.90/hr (variable)
Vast.ai operates a peer marketplace where GPU owners list capacity at market-determined prices. A100 pricing fluctuates with supply and demand. Per-second billing and the ability to bid on instances make it the cheapest option for interruptible workloads.
Strengths: Lowest spot-like pricing. Per-second billing. Price transparency via marketplace.
Best for: Cost-sensitive researchers running interruptible batch jobs. Experimentation workloads where price matters more than consistency.

A100 vs H100: When the A100 Is the Better Choice
The H100 is faster. That is not in dispute. With FP8 support, Transformer Engine, and 3x the memory bandwidth of the A100, the H100 finishes most training jobs 2-2.5x faster. But faster per hour does not always mean cheaper per job.
The Cost-Per-Job Calculation
| Metric | A100 SXM 80GB | H100 SXM 80GB |
|---|---|---|
| Cloud price (typical) | $1.20-$2.00/hr | $2.50-$3.50/hr |
| FP16 Tensor performance | 312 TFLOPS | 990 TFLOPS |
| Speedup factor | 1x (baseline) | ~2-2.5x |
| Cost to complete same job | 1x | 0.7-1.0x |
For compute-bound training at scale, the H100 is often cheaper per job despite the higher hourly rate. But there are several scenarios where the A100 wins on total cost:
Inference workloads. Most inference is memory-bandwidth-bound, not compute-bound. The A100's 80GB HBM2e with 2,039 GB/s bandwidth handles LLM inference at 70-85% of H100 throughput for 40-60% of the hourly cost. For serving Llama 3 70B or similar models, A100s deliver better cost-per-token.
Fine-tuning with LoRA/QLoRA. Parameter-efficient fine-tuning methods use a fraction of the GPU's compute capacity. Paying for H100 compute that sits idle during adapter training wastes money. A100s are the right tool for the job.
Memory-bound workloads. Large batch inference, long-context processing, and workloads limited by VRAM rather than compute see minimal speedup from the H100. The A100 gives you 80GB at a lower price.
Development and experimentation. Iteration speed during development is constrained by human thinking time, not GPU compute. Paying 2x/hr for an H100 during interactive development burns budget with no return.
Mature software stacks. Codebases optimized for Ampere architecture, including custom CUDA kernels and quantization schemes designed for A100, may not immediately benefit from Hopper features. Migration cost is real.
Bottom line: If your workload is compute-bound and you are training at scale, the H100 is likely cheaper per job. For everything else -- inference, fine-tuning, development, memory-bound workloads -- the A100 at $1.20-$2.00/hr is the more cost-effective choice.
A100 vs Consumer GPUs: When RTX 4090 Is Enough
The RTX 4090 has become the budget darling of the AI community, available on cloud platforms for $0.20-$0.44/hr. Before renting an A100, consider whether a consumer GPU covers your needs.
| Specification | A100 SXM 80GB | RTX 4090 |
|---|---|---|
| VRAM | 80GB HBM2e | 24GB GDDR6X |
| Memory Bandwidth | 2,039 GB/s | 1,008 GB/s |
| FP16 Tensor | 312 TFLOPS | 330 TFLOPS |
| Cloud price | $1.20-$2.00/hr | $0.20-$0.44/hr |
| Multi-GPU scaling | NVLink (600 GB/s) | PCIe only |
| MIG support | Yes (7 partitions) | No |
| ECC memory | Yes | No |
Choose the RTX 4090 when:
- Your model fits in 24GB VRAM (most 7B models, quantized 13B models)
- You are fine-tuning with QLoRA on models up to 13B parameters
- You are running inference on a single model under 13B parameters
- Budget is the primary constraint and you need maximum GPU hours per dollar
Choose the A100 when:
- Your workload requires more than 24GB VRAM
- You need multi-GPU scaling with NVLink interconnects
- You are running multi-tenant inference with MIG partitioning
- ECC memory is required for production reliability
- You are training or fine-tuning models above 13B parameters
The price difference is 3-5x, so using an A100 when a 4090 would suffice is one of the most common sources of wasted GPU spend.
Common A100 Workloads and Cost Estimates
Here is what real A100 workloads cost across different providers. All estimates assume A100 SXM 80GB.
Fine-Tuning a 7B Model (LoRA, ~8 hours)
| Provider | $/hr | Total Cost |
|---|---|---|
| AWS (P4d) | $4.50 | $36.00 |
| CoreWeave | $2.06 | $16.48 |
| Lambda Labs | $1.29 | $10.32 |
| io.net | $1.20 | $9.60 |
| Vast.ai | $0.90 | $7.20 |
Fine-Tuning a 70B Model (4x A100, ~48 hours)
| Provider | $/hr (4 GPUs) | Total Cost |
|---|---|---|
| AWS (P4d) | $18.00 | $864.00 |
| CoreWeave | $8.24 | $395.52 |
| Lambda Labs | $5.16 | $247.68 |
| io.net | $4.80 | $230.40 |
| Vast.ai | $3.60 | $172.80 |
Serving Llama 3 70B Inference (1 month, single A100)
| Provider | $/hr | Monthly Cost |
|---|---|---|
| AWS (P4d) | $4.50 | $3,240 |
| CoreWeave | $2.06 | $1,483 |
| RunPod | $1.39 | $1,001 |
| io.net | $1.50 | $1,080 |
| Vast.ai | $0.90 | $648 |
Batch Inference Pipeline (100 hours/month)
| Provider | $/hr | Monthly Cost |
|---|---|---|
| AWS Spot | $1.50 | $150 |
| Lambda Labs | $1.29 | $129 |
| io.net | $1.20 | $120 |
| Vast.ai | $0.90 | $90 |
Note: These estimates reflect compute costs only. Add 20-40% for hyperscaler egress, storage, and networking overhead. Decentralized networks like io.net have no egress fees, so the compute price more closely reflects total cost.
How to Choose an A100 Cloud Provider
Selecting the right A100 cloud rental comes down to five factors. Weight them based on your team's priorities.
1. Total Cost of Ownership, Not Just $/hr
The sticker price tells half the story. Factor in data egress ($0.08-$0.12/GB on hyperscalers, free on io.net), persistent storage charges, and minimum billing increments. A provider at $1.50/hr with no hidden fees can be cheaper than one at $1.29/hr with $0.10/GB egress on a data-heavy workload.
2. Availability and Waitlists
The cheapest provider is useless if you cannot get an instance. Lambda Labs frequently waitlists A100 availability. AWS spot instances are not always available in your region. Decentralized networks like io.net draw from 320,000+ GPUs across 130+ countries, reducing the likelihood of capacity constraints.
3. Multi-GPU Scaling
If your workload requires 4-8 A100s with NVLink interconnects, your options narrow. AWS P4d offers 8-GPU nodes. CoreWeave provides InfiniBand-connected clusters. io.net supports multi-GPU cluster deployments. Marketplace providers like Vast.ai primarily offer single-GPU instances.
4. Deployment Speed and Developer Experience
How fast can you go from "I need a GPU" to "my code is running"? Lambda and RunPod offer one-click notebook environments. io.net deploys clusters in under two minutes. AWS requires VPC setup, IAM roles, and instance configuration. If you are iterating quickly, deployment friction matters.
5. SLA and Support Requirements
Enterprise teams with production SLA requirements lean toward AWS, CoreWeave, or managed offerings. Research teams and startups optimizing for cost over guarantees benefit from marketplace and decentralized options. Match your provider to your actual reliability needs -- not aspirational ones.
Frequently Asked Questions
How much does it cost to rent an A100 GPU per hour?
A100 SXM 80GB cloud rental ranges from $0.90/hr (Vast.ai marketplace) to $4.50/hr (AWS P4d on-demand). The most common price range across specialized and decentralized providers is $1.20-$2.20/hr. Pricing has declined roughly 18% year-over-year as H100 and H200 supply has expanded.
What is the difference between A100 SXM and PCIe?
The SXM4 form factor runs at higher power (400W vs 250-300W), supports full NVLink interconnects at 600 GB/s for multi-GPU communication, and fits into NVIDIA HGX baseboards. The PCIe variant plugs into standard server slots and uses PCIe Gen4 at 64 GB/s. For multi-GPU training workloads, SXM is significantly better. For single-GPU inference, the difference is minimal.
Is the A100 40GB or 80GB better for AI workloads?
The 80GB variant is the standard for 2026. It supports larger models, bigger batch sizes, and longer context windows. The 40GB variant is adequate for inference on models under 30B parameters and fine-tuning on models under 13B, but the VRAM ceiling limits flexibility. The price difference ($0.80-$1.50 vs $1.20-$2.00/hr) is modest enough that the 80GB is usually worth the premium.
Should I rent an A100 or an H100?
Rent an A100 for inference, fine-tuning, development, and memory-bound workloads where the lower hourly rate ($1.20-$2.00 vs $2.50-$3.50) outweighs the H100's speed advantage. Rent an H100 for compute-bound training at scale where the 2-2.5x speedup makes it cheaper per completed job despite the higher hourly price.
Can I rent A100 GPUs without a long-term commitment?
Yes. Most specialized and decentralized providers offer on-demand A100 rental with no minimum commitment. io.net bills per minute with no contracts. RunPod and Lambda Labs bill per minute. Vast.ai bills per second. Only hyperscalers push toward reserved instance commitments for discount pricing.
How many A100 GPUs do I need for my workload?
For inference on models up to 70B parameters: 1 A100 80GB (quantized) or 2 A100s (full precision). For fine-tuning 7-13B models: 1 A100. For fine-tuning 70B models: 4 A100s. For pre-training runs: 8-64+ A100s depending on model size and target training time. Use the rule of thumb that model parameters in billions times two roughly equals the minimum VRAM in GB needed at full precision.
Are decentralized GPU clouds reliable for A100 workloads?
In 2026, yes -- for the right workloads. Networks like io.net implement hardware verification, uptime monitoring, and cluster orchestration. For training, fine-tuning, and batch inference, decentralized A100 compute is production-ready. For latency-critical production inference with strict SLA requirements below 100ms P99, centralized providers still have an edge.
Conclusion
The A100 remains the best price-performance GPU for the majority of AI workloads in 2026. It is not the fastest, but it is the most proven, the most supported, and -- at $1.20-$2.00/hr on decentralized and specialized clouds -- the most cost-efficient path to production-grade GPU compute.
For teams evaluating A100 cloud rental, the decision framework is straightforward:
- Optimize for cost: io.net or Vast.ai at $0.90-$2.00/hr with no hidden fees
- Optimize for developer experience: Lambda Labs or RunPod at $1.29-$2.20/hr
- Optimize for enterprise SLAs: CoreWeave or AWS at $2.06-$4.50/hr
The biggest mistake teams make is defaulting to a hyperscaler out of habit and paying 3-4x the market rate for equivalent hardware. The second biggest mistake is renting an H100 for workloads where the A100 delivers the same result at half the price.
io.net gives you access to A100 SXM 80GB clusters starting at $1.20/hr, deployed in under two minutes, with no egress fees and no minimum commitments. For teams spending $5,000 or more per month on GPU compute, switching A100 workloads to io.net is the single highest-impact cost optimization available.