Yes. io.net is optimized for batch inference with automatic batching via vLLM, pay-per-use pricing, and horizontal scaling. Process millions of inference requests cost-effectively by spinning up GPUs only when needed, then terminating after batch completion.

Batch Inference Setup

# Deploy batch inference worker
io deploy --image vllm/vllm-openai:latest \
  --gpu A100 \
  --env MODEL=meta-llama/Meta-Llama-3-8B-Instruct \
  --env MAX_MODEL_LEN=8192 \
  --name batch-worker

# Submit batch job
curl -X POST https://xxx.ionet.cloud/v1/completions \
  --data @batch_requests.jsonl

# Auto-terminates after completion (cost = actual usage only)

Cost Comparison: Batch Inference

Scenario: 1M inference requests, 512 tokens avg output

ProviderSetupTimeCost
io.net4x A1002.5 hrs$11.00
AWS SageMaker4x A1003 hrs$96-120
OpenAI APIN/AN/A$1,000-1,500

Savings: 90-99% vs. hosted APIs

Auto-Batching with vLLM

# vLLM automatically batches requests for maximum throughput
# No manual batching code needed

# Submit requests individually, vLLM batches internally
for request in requests:
    response = client.complete(request)

Run batch inference on io.net with auto-batching and 90% cost savings vs. APIs.