io.net provides 10-100 Gbps network connectivity depending on GPU tier and deployment configuration. Standard GPUs (RTX 4090, A100) include 10-25 Gbps networking, while enterprise H100 clusters offer 100 Gbps InfiniBand or RoCE (RDMA over Converged Ethernet) for low-latency multi-GPU communication. All deployments include unlimited inbound data transfer and competitive egress rates ($0.05/GB after 1TB/month).

Network performance is optimized for AI workloads: sub-5ms latency within GPU clusters, NVLink (900 GB/s) for multi-GPU training on H100 SXM configurations, and private VLAN networking for secure multi-node deployments.

Network Specifications by GPU Tier

GPU TypeNetwork SpeedInterconnectLatency (intra-cluster)Best For
RTX 409010 GbpsPCIe 4.0<10msSingle-GPU inference, development
RTX 309010 GbpsPCIe 3.0<10msCost-efficient workloads
A100 (40/80GB)25 GbpsPCIe 4.0 / NVLink<5msMulti-GPU training, inference
H100 (80GB SXM)100 GbpsInfiniBand / NVLink<2msLarge-scale distributed training
H100 (80GB PCIe)25-50 GbpsPCIe 5.0<5msHigh-throughput inference
L40S25 GbpsPCIe 4.0<5msProfessional visualization, AI

Multi-GPU Interconnects

NVLink (H100 SXM / A100 SXM):
- Bandwidth: 900 GB/s (H100), 600 GB/s (A100)
- Topology: All-to-all mesh (8 GPUs fully connected)
- Latency: <2μs GPU-to-GPU
- Use case: Distributed training (model parallelism, data parallelism)

InfiniBand (Enterprise H100 Clusters):
- Bandwidth: 100 Gbps per GPU (400 Gbps available)
- RDMA support: Yes (low CPU overhead)
- Latency: <2μs node-to-node
- Use case: 100+ GPU clusters, HPC workloads

PCIe (Standard GPUs):
- Bandwidth: 64 GB/s (PCIe 4.0), 128 GB/s (PCIe 5.0)
- Topology: GPU → CPU → Network
- Latency: 5-10μs
- Use case: Single-GPU or loosely-coupled workloads

Data Transfer Performance

Upload Speeds (to io.net):

# Test upload speed
dd if=/dev/zero bs=1M count=10000 | \
  io upload --instance my-gpu stdin:/data/testfile

# Typical results:
# Residential (100 Mbps): 10-12 MB/s
# Enterprise (1 Gbps): 100-120 MB/s
# Data center (10 Gbps): 1-1.2 GB/s

Download Speeds (from io.net):

# Test download speed
io download my-gpu:/data/large-file.bin /dev/null

# Typical results:
# From same region: 100-200 MB/s (800-1,600 Mbps)
# Cross-region: 50-100 MB/s (400-800 Mbps)
# To AWS S3 (same region): 200-400 MB/s

Inter-GPU Data Transfer (within cluster):

# NVLink (H100 SXM): 900 GB/s = 112.5 GB/sec
# Transfer 100GB model weights: 0.9 seconds

# PCIe 4.0 (A100): 64 GB/s = 8 GB/sec
# Transfer 100GB model weights: 12.5 seconds

# Network (25 Gbps): 3.125 GB/sec
# Transfer 100GB model weights: 32 seconds

Bandwidth Allocation

Guaranteed Baseline:
- All GPUs: Minimum 10 Gbps (no throttling)
- No bandwidth caps during peak hours
- Fair queuing for network resources

Burst Capacity:
- Standard GPUs: Burst to 25 Gbps when available
- Enterprise GPUs: Burst to 100 Gbps on InfiniBand

Egress Pricing

Data Transfer Out:

DestinationCostNotes
First 1TB/monthFreePromotional (all accounts)
1-10 TB/month$0.05/GB40-60% cheaper than AWS
10-50 TB/month$0.04/GBVolume discount
50+ TB/monthCustom pricingContact enterprise sales

Comparison to AWS:
- AWS egress: $0.08-0.12/GB
- io.net egress: $0.05/GB (after 1TB free)
- Savings: 38-58%

Ingress (upload to io.net):
- Always free (unlimited)

Latency Benchmarks

Inference API Response Time:

User Request → io.net GPU → Response
├─ Network RTT (ping): 15-50ms (depends on geography)
├─ Queue time: 5-20ms (depends on load)
├─ Inference time: 50-200ms (depends on model)
└─ Total: 70-270ms

Breakdown by region:
- Same region: 15ms RTT
- Cross-region (US East → West): 35ms RTT
- Cross-continent (US → EU): 80ms RTT

Multi-GPU Training Communication:

GPU Synchronization (gradient all-reduce):
├─ NVLink (H100 SXM): 2-5ms
├─ InfiniBand (100 Gbps): 3-8ms
├─ Ethernet (25 Gbps): 10-25ms
└─ PCIe (standard): 15-40ms

Impact on training speed:
- NVLink: 95-98% scaling efficiency (8 GPUs)
- InfiniBand: 90-95% scaling efficiency
- Ethernet: 80-90% scaling efficiency
- PCIe: 70-85% scaling efficiency

Network Configuration

Private Networking:

# Create private VLAN for multi-GPU deployment
io network create --name private-cluster \
  --subnet 10.0.1.0/24

# Deploy GPUs in private network
io deploy --gpu A100 --count 8 \
  --network private-cluster \
  --name training-cluster

# GPUs communicate via private 25 Gbps network
# Latency: <2ms intra-cluster
# Bandwidth: 25 Gbps per GPU (200 Gbps aggregate)

Public Endpoints:

# Expose inference API with TLS
io deploy --gpu A100 \
  --port 443 --ssl \
  --domain api.example.com \
  --name public-api

# Automatic TLS certificate provisioning
# DDoS protection included
# CDN integration for global low-latency

Optimization Tips

1. Co-locate data and compute:

# Mount S3 bucket in same region as GPU
io deploy --gpu A100 \
  --mount s3://us-west-2-bucket:/data \
  --region us-west-2

# Reduces data transfer latency: 80ms → 5ms

2. Use persistent storage for model weights:

# Pre-load models on persistent volume
io storage create --name models --size 500GB --region us-west-2
io upload models.tar.gz models:/

# All GPUs mount same volume (no network transfer needed)
io deploy --gpu A100 --count 4 \
  --mount models:/models \
  --name training-cluster

3. Enable GPU Direct RDMA (enterprise):

# H100 clusters with InfiniBand
io deploy --gpu H100 --count 16 \
  --network-mode rdma \
  --interconnect infiniband \
  --name hpc-cluster

# GPU-to-GPU transfers bypass CPU entirely
# Latency: <2μs, Bandwidth: 100 Gbps per GPU

Real-World Performance Examples

Scenario 1: LLM Inference API

User location: New York
GPU location: US East (same region)
Model: Llama 3 8B

Network latency breakdown:
- User → io.net: 15ms (RTT)
- Request processing: 5ms
- Inference: 80ms (model execution)
- Response → User: 15ms
Total: 115ms (85ms is inference, 30ms is network)

Optimization: Deploy in same region as users
Result: 70% of time is actual inference (good)

Scenario 2: Distributed Training (8x A100)

Workload: Llama 3 70B training
Configuration: 8x A100 with NVLink
Data size: 100GB dataset on S3

Initial data load: 320 seconds (1.25 Gbps avg from S3)
Training communication (gradient sync): 5-8ms per step
Network overhead: 3-5% of total training time

Optimization: Pre-load data to persistent storage
Result: <1% network overhead, 98% GPU utilization

Monitoring Network Performance

# Real-time network stats
io exec my-gpu -- iftop -i eth0

# Measure bandwidth to external endpoint
io exec my-gpu -- iperf3 -c iperf.he.net

# Latency monitoring
io exec my-gpu -- ping -c 100 8.8.8.8

# GPU interconnect bandwidth (NVLink/PCIe)
io exec my-gpu -- nvidia-smi nvlink --status

Deploy GPUs on io.net with 10-100 Gbps networking and sub-5ms latency for distributed training.