FAQ: What is the latency for GPU workloads on io.net?

io.net delivers <50ms inference latency for LLM workloads (time-to-first-token), comparable to centralized clouds. Intra-cluster GPU communication latency is <5ms for distributed training, with NVLink providing <2μs GPU-to-GPU latency on H100 SXM configurations.

Latency Benchmarks

Workload	Latency	Configuration
LLM Inference (7B-13B)	40-60ms	Single GPU (A100)
LLM Inference (70B)	80-120ms	4x A100
Image Generation (SDXL)	4-6 sec	RTX 4090
GPU-to-GPU (training)	<2μs	H100 SXM NVLink

Regional Latency

From user to GPU:
- Same region: 10-20ms
- Cross-region (US): 35-50ms
- Cross-continent: 80-150ms

Optimization: Deploy in region nearest to users for lowest latency.

Low-latency GPU inference on io.net — <50ms TTFT, 50+ regions worldwide.