Important Note on Usage Limits
The IO Intelligence API provides the following free daily limits (measured in LLM tokens) per account, per day, per model..Column Definitions:
- LLM Model Name: The name of the large language model (LLM) available for use.
- Daily Chat Quota: The maximum number of tokens you can use in chat-based interactions with this model per day.
- Daily API Quota: The maximum number of tokens allowed for API-based interactions per day.
- Daily Embeddings Quota: The maximum number of tokens available for embedding operations per day.
- Context Length: The maximum number of tokens the model can process in a single request (including both input and output).
LLM Model Name | Daily Chat quote | Daily API quote | Daily Embeddings quote | Context Length |
---|---|---|---|---|
deepseek-ai/DeepSeek-R1-0528 | 1,000,000 tk | 500,000 tk | N/A | 128,000 tk |
swiss-ai/Apertus-70B-Instruct-2509 | 1,000,000 tk | 500,000 tk | N/A | 131,072 tk |
meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8 | 1,000,000 tk | 500,000 tk | N/A | 430,000 tk |
openai/gpt-oss-120b | 1,000,000 tk | 500,000 tk | N/A | 131,072 tk |
Intel/Qwen3-Coder-480B-A35B-Instruct-int4-mixed-ar | 1,000,000 tk | 500,000 tk | N/A | 106,000 tk |
Qwen/Qwen3-Next-80B-A3B-Instruct | 1,000,000 tk | 500,000 tk | N/A | 262,144 tk |
openai/gpt-oss-20b | 1,000,000 tk | 500,000 tk | N/A | 131,072 tk |
Qwen3-235B-A22B-Thinking-2507 | 1,000,000 tk | 500,000 tk | N/A | 262,144 tk |
mistralai/Mistral-Nemo-Instruct-2407 | 1,000,000 tk | 500,000 tk | N/A | 128,000 tk |
mistralai/Magistral-Small-2506 | 1,000,000 tk | 500,000 tk | N/A | 128,000 tk |
mistralai/Devstral-Small-2505 | 1,000,000 tk | 500,000 tk | N/A | 128,000 tk |
LLM360/K2-Think | 1,000,000 tk | 500,000 tk | N/A | 152,064 tk |
meta-llama/Llama-3.3-70B-Instruct | 1,000,000 tk | 500,000 tk | N/A | 128,000 tk |
mistralai/Mistral-Large-Instruct-2411 | 1,000,000 tk | 500,000 tk | N/A | 128,000 tk |
Qwen/Qwen2.5-VL-32B-Instruct | N/A | 200,000 tk | N/A | 32,000 tk |
meta-llama/Llama-3.2-90B-Vision-Instruct | N/A | 200,000 tk | N/A | 16,000 tk |
BAAI/bge-multilingual-gemma2 | N/A | N/A | 1,000,000 tk | 4,096 tk |
Introduction
You can interact with the API using HTTP requests from any programming language or by using the official Python and Node.js libraries. To install the official Python library, run the following command:Example: Using the IO Intelligence API with Python
Here’s an example of how you can use theopenai Python
library to interact with the IO Intelligence API:
Llama-3.3-70B-Instruct
model, and retrieve a response.
Authentication
API keys
IO Intelligence APIs authenticate requests using API keys. You can generate API keys from your user account:Always treat your API key as a secret! Do not share it or expose it in client-side code (e.g., browsers or mobile apps). Instead, store it securely in an environment variable or a key management service on your backend server.
Authorization
HTTP header for all API requests:
Example: List Available Models
Here’s an examplecurl
command to list all models available in IO Intelligence:

Making requests
To test the API, use the followingcurl
command. Replace $IOINTELLIGENCE_API_KEY
with your actual API key.

meta-llama/Llama-3.3-70B-Instruct
model to generate a chat completion for the input: “Say this is a test!”.:
Example Response
The API should return a response like this:Key Details in the Response
- finish_reason: Indicates why the generation stopped (e.g., “stop”).
- choices: Contains the generated response(s). Adjust the n parameter to generate multiple response choices.