ChatCompletionsRequestPayload

ChatCompletionsRequestPayload
ChatCompletionsRequestPayload

Request to generate chat-completions.

JSON Example
{
    "max_tokens": 64,
    "messages": [
        {
            "content": "You are a helpful assistant.",
            "role": "system"
        },
        {
            "content": "Hello!",
            "role": "user"
        }
    ],
    "model": "model-to-use",
    "n": 1,
    "stop": [
        "END"
    ],
    "stream": false,
    "temperature": 0
}
number
temperature
Optional
Constraints: minimum: 0 maximum: 2 default: 0

temperature

integer
n
Optional
Constraints: minimum: 1 default: 1

n

array of string
stop
Optional
Constraints: minItems: 1

stop

integer
max_tokens
Optional
Constraints: minimum: 1

max_tokens

boolean
stream
Optional

stream

stream_options
Optional

Options for streaming response. Only set this when you set stream: true.

string
model
Required

ID of the completions model to use.

array of ChatMessage
messages
Required
Constraints: minItems: 1

messages

tools
Optional

tools

integer
seed
Optional

Seed to propagate to the LLM for making repeated requests with the same seed as deterministic as possible. Note that this feature is in beta for most inference servers.

tool_choice
Optional

tool_choice