The NVIDIA H100 comes in two form factors: SXM (Server Module with NVLink) and PCIe (standard PCI Express card). The SXM version offers 700W TDP, 900 GB/s NVLink interconnect, and is designed for multi-GPU training workloads in dense server configurations. The PCIe version has 350W TDP, no NVLink, and is optimized for single-GPU inference or installations in standard servers.

For most AI workloads, SXM provides 40-60% better performance in multi-GPU training scenarios due to NVLink's high-bandwidth GPU-to-GPU communication (900 GB/s vs. PCIe 4.0's 64 GB/s). However, PCIe is 20-30% cheaper, more widely available, and perfectly adequate for inference, fine-tuning smaller models, or single-GPU workloads.

Technical Specifications Comparison

SpecificationH100 SXMH100 PCIe
Form FactorProprietary SXM5 moduleStandard PCIe Gen 5 x16 card
TDP (Power)700W350W
GPU Memory80GB HBM380GB HBM3
Memory Bandwidth3.35 TB/s2.0 TB/s
NVLink900 GB/s (18 links)None
PCIe InterfacePCIe Gen 5 x16PCIe Gen 5 x16
Compute (FP16)1,979 TFLOPS1,513 TFLOPS
Compute (INT8)3,958 TOPS3,026 TOPS
DimensionsRequires HGX H100 serverStandard dual-slot card
Price (io.net)$2.20/hour$1.49/hour
AvailabilityLimited (data center only)More widely available

Performance Differences

Multi-GPU Training (8x H100):

Llama 3 70B Full Fine-Tuning (10K samples, 3 epochs)

SXM Configuration:
- 8x H100 SXM with NVLink (900 GB/s interconnect)
- Training time: 24 hours
- Throughput: 12,500 tokens/second
- Cost: $17.60/hour × 24 = $422
- GPU utilization: 92%

PCIe Configuration:
- 8x H100 PCIe without NVLink (64 GB/s PCIe 5.0)
- Training time: 38 hours (+58% slower)
- Throughput: 7,900 tokens/second
- Cost: $11.92/hour × 38 = $453
- GPU utilization: 78% (communication bottleneck)

Winner: SXM (faster completion, lower total cost despite higher hourly rate)

Single-GPU Inference:

Llama 3 8B Inference Serving (vLLM)

SXM Configuration:
- 1x H100 SXM
- Throughput: 85 tokens/second
- Latency (TTFT): 18ms
- Cost: $2.20/hour
- Power: 700W

PCIe Configuration:
- 1x H100 PCIe
- Throughput: 82 tokens/second (4% slower)
- Latency (TTFT): 19ms
- Cost: $1.49/hour (32% cheaper)
- Power: 350W

Winner: PCIe (nearly identical performance, significantly cheaper)

When to Use Each

Choose H100 SXM When:

  1. Multi-GPU Training (4+ GPUs)
    - Large language model training (70B+ parameters)
    - Distributed training with high inter-GPU communication
    - Model parallelism (tensor parallelism, pipeline parallelism)
    - Example: Training GPT-style models, Stable Diffusion XL fine-tuning
  2. Maximum Performance Required
    - Time-sensitive research deadlines
    - Production training pipelines where speed matters
    - Benchmarking and competitive ML competitions
  3. High-Throughput Batch Inference
    - Serving large models (70B+) with high concurrency
    - Batch processing millions of requests per day
    - Example: Enterprise LLM API serving at scale

Choose H100 PCIe When:

  1. Single-GPU Workloads
    - Inference serving for 7B-13B models
    - Fine-tuning smaller models (<30B parameters)
    - Development and experimentation
    - Example: Hosting Llama 3 8B API endpoint
  2. Cost Optimization
    - Budget-constrained projects
    - Long-running inference workloads (cost adds up)
    - Example: 24/7 chatbot serving ($1.49/hr × 730 hrs = $1,088/month vs. $1,606/month for SXM)
  3. Standard Server Infrastructure
    - Deploying in existing PCIe servers
    - Edge deployments without HGX chassis
    - Easier hardware sourcing and flexibility
  4. Power-Constrained Environments
    - Colocation facilities with power limits
    - Sustainability initiatives (50% lower power consumption)

Architecture Differences

H100 SXM Architecture:

HGX H100 Server Chassis:
┌─────────────────────────────────┐
│  8x H100 SXM modules           │
│  ├─ NVLink Switch (900 GB/s)   │
│  ├─ All-to-all interconnect     │
│  └─ Shared power/cooling        │
│                                 │
│  Requirements:                  │
│  - HGX basebone (~$50K)        │
│  - 8x 700W PSU (5.6kW total)   │
│  - Liquid cooling               │
│  - Specialized chassis          │
└─────────────────────────────────┘

H100 PCIe Architecture:

Standard Server:
┌─────────────────────────────────┐
│  8x H100 PCIe cards            │
│  ├─ PCIe 5.0 x16 per GPU       │
│  ├─ No direct GPU interconnect  │
│  └─ Air-cooled                  │
│                                 │
│  Requirements:                  │
│  - Standard server chassis      │
│  - 8x 350W PSU (2.8kW total)   │
│  - Air cooling (fans)           │
│  - Commodity hardware           │
└─────────────────────────────────┘

Real-World Benchmarks

Multi-GPU Training Performance:

WorkloadMetric8x SXM8x PCIeSXM Advantage
Llama 3 70B trainingSamples/sec185117+58%
Stable Diffusion XLImages/sec420290+45%
GPT-NeoX 20BTokens/sec28,50019,200+48%
BERT fine-tuningSteps/sec145112+29%

Single-GPU Inference Performance:

ModelMetricSXMPCIeDifference
Llama 3 8BTokens/sec8582+4%
Llama 3 70BTokens/sec3836+6%
Mistral 7BTokens/sec9289+3%
SD 1.5Images/sec1211.5+4%

Verdict: SXM provides minimal advantage for single-GPU workloads (not worth 48% price premium).

Memory Bandwidth Impact

SXM: 3.35 TB/s HBM3 Bandwidth
- Faster weight loading from VRAM
- Better performance on memory-bound operations
- Advantage for inference with large batch sizes

PCIe: 2.0 TB/s HBM3 Bandwidth
- Sufficient for most workloads
- Bottleneck only in extreme cases (very large batches)

Real-World Impact:
- Inference: 3-5% performance difference
- Training: 8-12% difference (memory-bound operations)
- Fine-tuning: 5-8% difference

Cost Analysis: 30-Day Deployment

Inference Workload (24/7 Llama 3 8B serving):

SXM: $2.20/hour × 720 hours = $1,584
PCIe: $1.49/hour × 720 hours = $1,073
Savings: $511/month (32% cheaper)

Performance difference: 4% (negligible for inference)
Winner: PCIe (massive cost savings, minimal performance loss)

Training Workload (8x GPU, 200 hours/month):

SXM: $17.60/hour × 200 hours = $3,520
PCIe: $11.92/hour × 315 hours = $3,755
(PCIe requires 58% longer to complete same training)

Winner: SXM (faster time-to-result, lower total cost despite higher hourly rate)

Availability on io.net

GPU TypeAvailabilityAvg. Wait TimeTypical io.net Price
H100 SXMLimited0-5 minutes$2.20/hour
H100 PCIeGoodInstant$1.49/hour
A100 SXMExcellentInstant$1.10/hour
A100 PCIeExcellentInstant$0.73/hour

Recommendation Matrix

Your WorkloadRecommendedReason
Single-GPU inferencePCIe32% cheaper, <5% slower
Multi-GPU training (8+ GPUs)SXM45-60% faster, worth premium
Fine-tuning 7B-13BPCIeSufficient performance, lower cost
Fine-tuning 70B+SXMFaster convergence, better ROI
24/7 inference servingPCIeCost savings compound over time
Time-sensitive researchSXMMaximum speed, results matter more than cost

Deploy H100 SXM or PCIe on io.net with instant availability and 60-70% cost savings vs. AWS.