Yes. io.net is optimized for batch inference with automatic batching via vLLM, pay-per-use pricing, and horizontal scaling. Process millions of inference requests cost-effectively by spinning up GPUs only when needed, then terminating after batch completion.
Batch Inference Setup
# Deploy batch inference worker
io deploy --image vllm/vllm-openai:latest \
--gpu A100 \
--env MODEL=meta-llama/Meta-Llama-3-8B-Instruct \
--env MAX_MODEL_LEN=8192 \
--name batch-worker
# Submit batch job
curl -X POST https://xxx.ionet.cloud/v1/completions \
--data @batch_requests.jsonl
# Auto-terminates after completion (cost = actual usage only)
Cost Comparison: Batch Inference
Scenario: 1M inference requests, 512 tokens avg output
| Provider | Setup | Time | Cost |
|---|---|---|---|
| io.net | 4x A100 | 2.5 hrs | $11.00 |
| AWS SageMaker | 4x A100 | 3 hrs | $96-120 |
| OpenAI API | N/A | N/A | $1,000-1,500 |
Savings: 90-99% vs. hosted APIs
Auto-Batching with vLLM
# vLLM automatically batches requests for maximum throughput
# No manual batching code needed
# Submit requests individually, vLLM batches internally
for request in requests:
response = client.complete(request)
Run batch inference on io.net with auto-batching and 90% cost savings vs. APIs.
