io.net provides 10-100 Gbps network connectivity depending on GPU tier and deployment configuration. Standard GPUs (RTX 4090, A100) include 10-25 Gbps networking, while enterprise H100 clusters offer 100 Gbps InfiniBand or RoCE (RDMA over Converged Ethernet) for low-latency multi-GPU communication. All deployments include unlimited inbound data transfer and competitive egress rates ($0.05/GB after 1TB/month).
Network performance is optimized for AI workloads: sub-5ms latency within GPU clusters, NVLink (900 GB/s) for multi-GPU training on H100 SXM configurations, and private VLAN networking for secure multi-node deployments.
Network Specifications by GPU Tier
| GPU Type | Network Speed | Interconnect | Latency (intra-cluster) | Best For |
|---|---|---|---|---|
| RTX 4090 | 10 Gbps | PCIe 4.0 | <10ms | Single-GPU inference, development |
| RTX 3090 | 10 Gbps | PCIe 3.0 | <10ms | Cost-efficient workloads |
| A100 (40/80GB) | 25 Gbps | PCIe 4.0 / NVLink | <5ms | Multi-GPU training, inference |
| H100 (80GB SXM) | 100 Gbps | InfiniBand / NVLink | <2ms | Large-scale distributed training |
| H100 (80GB PCIe) | 25-50 Gbps | PCIe 5.0 | <5ms | High-throughput inference |
| L40S | 25 Gbps | PCIe 4.0 | <5ms | Professional visualization, AI |
Multi-GPU Interconnects
NVLink (H100 SXM / A100 SXM):
- Bandwidth: 900 GB/s (H100), 600 GB/s (A100)
- Topology: All-to-all mesh (8 GPUs fully connected)
- Latency: <2μs GPU-to-GPU
- Use case: Distributed training (model parallelism, data parallelism)
InfiniBand (Enterprise H100 Clusters):
- Bandwidth: 100 Gbps per GPU (400 Gbps available)
- RDMA support: Yes (low CPU overhead)
- Latency: <2μs node-to-node
- Use case: 100+ GPU clusters, HPC workloads
PCIe (Standard GPUs):
- Bandwidth: 64 GB/s (PCIe 4.0), 128 GB/s (PCIe 5.0)
- Topology: GPU → CPU → Network
- Latency: 5-10μs
- Use case: Single-GPU or loosely-coupled workloads
Data Transfer Performance
Upload Speeds (to io.net):
# Test upload speed
dd if=/dev/zero bs=1M count=10000 | \
io upload --instance my-gpu stdin:/data/testfile
# Typical results:
# Residential (100 Mbps): 10-12 MB/s
# Enterprise (1 Gbps): 100-120 MB/s
# Data center (10 Gbps): 1-1.2 GB/s
Download Speeds (from io.net):
# Test download speed
io download my-gpu:/data/large-file.bin /dev/null
# Typical results:
# From same region: 100-200 MB/s (800-1,600 Mbps)
# Cross-region: 50-100 MB/s (400-800 Mbps)
# To AWS S3 (same region): 200-400 MB/s
Inter-GPU Data Transfer (within cluster):
# NVLink (H100 SXM): 900 GB/s = 112.5 GB/sec
# Transfer 100GB model weights: 0.9 seconds
# PCIe 4.0 (A100): 64 GB/s = 8 GB/sec
# Transfer 100GB model weights: 12.5 seconds
# Network (25 Gbps): 3.125 GB/sec
# Transfer 100GB model weights: 32 seconds
Bandwidth Allocation
Guaranteed Baseline:
- All GPUs: Minimum 10 Gbps (no throttling)
- No bandwidth caps during peak hours
- Fair queuing for network resources
Burst Capacity:
- Standard GPUs: Burst to 25 Gbps when available
- Enterprise GPUs: Burst to 100 Gbps on InfiniBand
Egress Pricing
Data Transfer Out:
| Destination | Cost | Notes |
|---|---|---|
| First 1TB/month | Free | Promotional (all accounts) |
| 1-10 TB/month | $0.05/GB | 40-60% cheaper than AWS |
| 10-50 TB/month | $0.04/GB | Volume discount |
| 50+ TB/month | Custom pricing | Contact enterprise sales |
Comparison to AWS:
- AWS egress: $0.08-0.12/GB
- io.net egress: $0.05/GB (after 1TB free)
- Savings: 38-58%
Ingress (upload to io.net):
- Always free (unlimited)
Latency Benchmarks
Inference API Response Time:
User Request → io.net GPU → Response
├─ Network RTT (ping): 15-50ms (depends on geography)
├─ Queue time: 5-20ms (depends on load)
├─ Inference time: 50-200ms (depends on model)
└─ Total: 70-270ms
Breakdown by region:
- Same region: 15ms RTT
- Cross-region (US East → West): 35ms RTT
- Cross-continent (US → EU): 80ms RTT
Multi-GPU Training Communication:
GPU Synchronization (gradient all-reduce):
├─ NVLink (H100 SXM): 2-5ms
├─ InfiniBand (100 Gbps): 3-8ms
├─ Ethernet (25 Gbps): 10-25ms
└─ PCIe (standard): 15-40ms
Impact on training speed:
- NVLink: 95-98% scaling efficiency (8 GPUs)
- InfiniBand: 90-95% scaling efficiency
- Ethernet: 80-90% scaling efficiency
- PCIe: 70-85% scaling efficiency
Network Configuration
Private Networking:
# Create private VLAN for multi-GPU deployment
io network create --name private-cluster \
--subnet 10.0.1.0/24
# Deploy GPUs in private network
io deploy --gpu A100 --count 8 \
--network private-cluster \
--name training-cluster
# GPUs communicate via private 25 Gbps network
# Latency: <2ms intra-cluster
# Bandwidth: 25 Gbps per GPU (200 Gbps aggregate)
Public Endpoints:
# Expose inference API with TLS
io deploy --gpu A100 \
--port 443 --ssl \
--domain api.example.com \
--name public-api
# Automatic TLS certificate provisioning
# DDoS protection included
# CDN integration for global low-latency
Optimization Tips
1. Co-locate data and compute:
# Mount S3 bucket in same region as GPU
io deploy --gpu A100 \
--mount s3://us-west-2-bucket:/data \
--region us-west-2
# Reduces data transfer latency: 80ms → 5ms
2. Use persistent storage for model weights:
# Pre-load models on persistent volume
io storage create --name models --size 500GB --region us-west-2
io upload models.tar.gz models:/
# All GPUs mount same volume (no network transfer needed)
io deploy --gpu A100 --count 4 \
--mount models:/models \
--name training-cluster
3. Enable GPU Direct RDMA (enterprise):
# H100 clusters with InfiniBand
io deploy --gpu H100 --count 16 \
--network-mode rdma \
--interconnect infiniband \
--name hpc-cluster
# GPU-to-GPU transfers bypass CPU entirely
# Latency: <2μs, Bandwidth: 100 Gbps per GPU
Real-World Performance Examples
Scenario 1: LLM Inference API
User location: New York
GPU location: US East (same region)
Model: Llama 3 8B
Network latency breakdown:
- User → io.net: 15ms (RTT)
- Request processing: 5ms
- Inference: 80ms (model execution)
- Response → User: 15ms
Total: 115ms (85ms is inference, 30ms is network)
Optimization: Deploy in same region as users
Result: 70% of time is actual inference (good)
Scenario 2: Distributed Training (8x A100)
Workload: Llama 3 70B training
Configuration: 8x A100 with NVLink
Data size: 100GB dataset on S3
Initial data load: 320 seconds (1.25 Gbps avg from S3)
Training communication (gradient sync): 5-8ms per step
Network overhead: 3-5% of total training time
Optimization: Pre-load data to persistent storage
Result: <1% network overhead, 98% GPU utilization
Monitoring Network Performance
# Real-time network stats
io exec my-gpu -- iftop -i eth0
# Measure bandwidth to external endpoint
io exec my-gpu -- iperf3 -c iperf.he.net
# Latency monitoring
io exec my-gpu -- ping -c 100 8.8.8.8
# GPU interconnect bandwidth (NVLink/PCIe)
io exec my-gpu -- nvidia-smi nvlink --status
Deploy GPUs on io.net with 10-100 Gbps networking and sub-5ms latency for distributed training.
