io.net supports all major AI/ML frameworks including PyTorch, TensorFlow, JAX, HuggingFace Transformers, vLLM, Ray, DeepSpeed, Axolotl, and Unsloth. All frameworks run via Docker containers with CUDA pre-installed—simply deploy your image or use io.net's pre-configured templates with the latest framework versions.
Supported Frameworks
Deep Learning Frameworks:
- PyTorch (2.x, 1.x) — Most popular for research and production
- TensorFlow (2.x) — Google's framework, Keras integration
- JAX — Google's autodiff library, XLA compilation
- MXNet — Apache framework, efficient for distributed training
- PaddlePaddle — Baidu's framework
LLM & NLP:
- HuggingFace Transformers — 100K+ pre-trained models
- vLLM — High-throughput inference (40-80 tokens/sec)
- Text Generation Inference (TGI) — HuggingFace's serving
- LangChain — LLM application framework
- LlamaIndex — Data framework for LLMs
Distributed Training:
- Ray — Distributed computing, hyperparameter tuning
- DeepSpeed — Microsoft's optimization library (ZeRO, 3D parallelism)
- Horovod — Uber's distributed training framework
- PyTorch DDP — Native distributed data parallel
- TensorFlow Distributed — TF's distributed strategy
Fine-Tuning & Training:
- Axolotl — Fine-tuning toolkit (LoRA, QLoRA, full fine-tuning)
- Unsloth — 2x faster fine-tuning, memory-efficient
- PEFT — Parameter-efficient fine-tuning (HuggingFace)
- TRL — Transformer Reinforcement Learning (RLHF, DPO)
Computer Vision:
- MMDetection — Object detection framework
- Detectron2 — Facebook's detection platform
- YOLOv8 — Real-time object detection
- Segment Anything (SAM) — Meta's segmentation model
- OpenCV — Computer vision library (GPU-accelerated)
Reinforcement Learning:
- RLlib — Ray's RL library
- Stable Baselines3 — RL algorithms (PPO, SAC, TD3)
- OpenAI Gym — RL environment framework
Data Processing:
- RAPIDS — NVIDIA's GPU-accelerated data science (cuDF, cuML)
- Dask — Parallel computing library
- Spark — Big data processing (GPU support via RAPIDS)
Pre-Configured Docker Images
# PyTorch (latest)
io deploy --image pytorch/pytorch:2.1.0-cuda12.1-cudnn8-runtime --gpu A100
# TensorFlow (latest)
io deploy --image tensorflow/tensorflow:latest-gpu --gpu A100
# HuggingFace + vLLM
io deploy --image vllm/vllm-openai:latest --gpu A100
# Jupyter with PyTorch
io deploy --image jupyter/pytorch-notebook:latest --gpu RTX4090 --port 8888
# Ray cluster
io deploy --image rayproject/ray:latest-gpu --gpu A100 --count 4
# DeepSpeed
io deploy --image deepspeed/deepspeed:latest --gpu A100 --count 8
Custom Framework Installation
Dockerfile Example:
FROM nvidia/cuda:12.1.0-cudnn8-runtime-ubuntu22.04
# Install Python
RUN apt-get update && apt-get install -y python3 python3-pip
# Install PyTorch
RUN pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
# Install additional frameworks
RUN pip3 install transformers accelerate bitsandbytes
# Install custom dependencies
COPY requirements.txt /workspace/
RUN pip3 install -r /workspace/requirements.txt
# Set working directory
WORKDIR /workspace
Build and deploy:
docker build -t my-ml-image:latest .
io deploy --image my-ml-image:latest --gpu A100
Framework-Specific Guides
PyTorch Distributed Training:
import torch
import torch.distributed as dist
from torch.nn.parallel import DistributedDataParallel as DDP
# Initialize distributed environment
dist.init_process_group(backend="nccl")
# Wrap model with DDP
model = MyModel().to(device)
ddp_model = DDP(model, device_ids=[local_rank])
# Train across multiple GPUs
# io.net handles NCCL communication automatically
vLLM Inference:
io deploy --image vllm/vllm-openai:latest \
--gpu A100 \
--env MODEL=meta-llama/Meta-Llama-3-8B-Instruct \
--env MAX_MODEL_LEN=8192 \
--port 8000 \
--name vllm-api
# OpenAI-compatible API at https://xxx.ionet.cloud:8000
Ray Distributed:
import ray
ray.init(address="ray://xxx.ionet.cloud:10001")
@ray.remote(num_gpus=1)
def train_model(data):
# Distributed across cluster GPUs
return trained_model
results = ray.get([train_model.remote(d) for d in datasets])
GPU-Accelerated Libraries
RAPIDS (GPU DataFrames):
import cudf # GPU DataFrame
import cuml # GPU Machine Learning
# Read CSV on GPU
df = cudf.read_csv("large_dataset.csv")
# GPU-accelerated operations (10-50x faster than pandas)
df_filtered = df[df['value'] > 100].groupby('category').mean()
# Train model on GPU
from cuml.ensemble import RandomForestClassifier
model = RandomForestClassifier(n_estimators=100)
model.fit(X_train, y_train)
JAX (Google):
import jax
import jax.numpy as jnp
# Automatic GPU usage
x = jnp.array([1, 2, 3])
y = jnp.dot(x, x) # Runs on GPU
# JIT compilation for performance
@jax.jit
def fast_function(x):
return jnp.sum(x ** 2)
Version Compatibility
| Framework | Recommended Version | CUDA Version | Python |
|---|---|---|---|
| PyTorch | 2.1.x | 12.1 | 3.10+ |
| TensorFlow | 2.14.x | 12.2 | 3.9+ |
| JAX | 0.4.x | 12.1 | 3.9+ |
| HuggingFace | 4.36.x | Any | 3.8+ |
| vLLM | 0.3.x | 12.1 | 3.9+ |
| DeepSpeed | 0.12.x | 11.8+ | 3.8+ |
Mixed Precision Training
# PyTorch AMP (Automatic Mixed Precision)
from torch.cuda.amp import autocast, GradScaler
scaler = GradScaler()
for batch in dataloader:
with autocast():
output = model(batch)
loss = loss_fn(output, target)
scaler.scale(loss).backward()
scaler.step(optimizer)
scaler.update()
# 2-3x faster training, 50% memory reduction
Deploy any AI framework on io.net with pre-configured containers and GPU optimization.
