Authorizations
The access token received from the authorization server in the OAuth 2.0 flow.
Headers
JWT token
io.net provided API Key
API key set by an SDK client
Body
The conversation history
1
- ChatCompletionDeveloperMessageParam
- ChatCompletionSystemMessageParam
- ChatCompletionUserMessageParam
- ChatCompletionAssistantMessageParam
- ChatCompletionToolMessageParam
- ChatCompletionFunctionMessageParam
- CustomChatCompletionMessageParam
1 <= x <= 20
"none"
x >= 1
If true, the new message will be prepended with the last message if they belong to the same role.
If true, the generation prompt will be added to the chat template. This is a parameter used by chat template in tokenizer config of the model.
If this is set, the chat will be formatted so that the final message in the chat is open-ended, without any EOS tokens. The model will continue this message rather than starting a new one. This allows you to "prefill" part of the model's response for it. Cannot be used at the same time as add_generation_prompt
.
If true, special tokens (e.g. BOS) will be added to the prompt on top of what is added by the chat template. For most models, the chat template takes care of adding the special tokens so this should be set to false (as is the default).
A list of dicts representing documents that will be accessible to the model if it is performing RAG (retrieval-augmented generation). If the template does not support RAG, this argument will have no effect. We recommend that each document should be a dict containing "title" and "text" keys.
A Jinja template to use for this conversion. As of transformers v4.44, default chat template is no longer allowed, so you must provide a chat template if the tokenizer does not define one.
Additional kwargs to pass to the template renderer. Will be accessible by the chat template.
If specified, the output will follow the JSON schema.
If specified, the output will follow the regex pattern.
If specified, the output will be exactly one of the choices.
If specified, the output will follow the context free grammar.
If specified, will override the default guided decoding backend of the server for this specific request. If set, must be either 'outlines' / 'lm-format-enforcer'
If specified, will override the default whitespace pattern for guided json decoding.
The priority of the request (lower means earlier handling; default: 0). Any priority other than 0 will raise an error if the served model does not use priority scheduling.
The request_id related to this request. If the caller does not set it, a random_uuid will be generated. This id is used through out the inference process and return in response.
Response
Successful Response
The response is of type any
.