Create Assistant Chat Completion
This method creates a model response for the given chat conversation. The conversation is updated to reflect the agent settings and to use any inputs from enabled integrations, before it is forwarded to the LLM.
This method is compatible with the OpenAI endpoint for creating a chat completion. However, it infers additional inputs to the conversation as described above.
agent_id
Show optional properties
{
"messages": [
{
"content": "You are a helpful assistant.",
"role": "system"
},
{
"content": "Hello!",
"role": "user"
}
]
}
{
"create_session": false,
"temperature": "number",
"n": 0,
"stop": [
"string"
],
"max_tokens": 0,
"stream": false,
"stream_options": {
"include_usage": false
},
"model": "string",
"messages": [
{
"role": "string",
"content": "string",
"tool_calls": [
{
"id": "string",
"type": "string",
"function": {
"name": "string",
"arguments": "string"
}
}
],
"refusal": "string",
"tool_call_id": "string"
}
],
"tools": [
{
"type": "string",
"function": {
"name": "string",
"description": "string",
"parameters": {
"properties": {
"properties": {
"type": "string",
"description": "string",
"enum": [
{}
],
"items": {
"type": "string"
}
}
},
"type": "string",
"required": [
"string"
],
"additionalProperties": false
},
"strict": false
},
"uc_function": {
"name": "string"
}
}
],
"seed": 0,
"tool_choice": "string",
"store_in_session": "string"
}
If true, the request creates a new agent session and the LLM interaction is stored as context for subsequent agent interactions when using the generated session.
temperature
n
stop
max_tokens
stream
Options for streaming response. Only set this when you set stream: true.
Optional ID of the model to use. If provided, it must match the model specified in the agent configuration. Unless the client needs to validate that the specified model is in use by the agent, do not specify this value and the API will choose the correct model. For compatibility with the OpenAI client SDK, this parameter may either be unset or an empty string may be used to indicate the use of the agent default configuration.
A chat request. content can be a string, or an array of content parts.
A content part is one of the following:
- :py:class:
TextContentPart <mlflow.types.chat.TextContentPart> - :py:class:
ImageContentPart <mlflow.types.chat.ImageContentPart> - :py:class:
AudioContentPart <mlflow.types.chat.AudioContentPart>
A tool definition for the chat endpoint with Unity Catalog integration. The Gateway request accepts a special tool type 'uc_function' for Unity Catalog integration. https://mlflow.org/docs/latest/llms/deployments/uc_integration.html
Seed to propagate to the LLM for making repeated requests with the same seed as deterministic as possible. Note that this feature is in beta for most inference servers.
tool_choice
If set, use and extend the context stored in the given session for all LLM interactions.
Successful Response
{
"choices": [
{
"finish_reason": "stop",
"index": 0,
"message": {
"content": "Hello! I am an AI assistant",
"role": "assistant"
}
}
],
"created": 1700173217,
"id": "3cdb958c-e4cc-4834-b52b-1d1a7f324714",
"model": "llama-2-70b-chat-hf",
"object": "chat.completion",
"usage": {
"completion_tokens": 8,
"prompt_tokens": 10,
"total_tokens": 18
}
}
Unsupported role (e.g., "system") or unexpected model in chat completions message.
No agent found with the specified ID.
Validation Error
{
"detail": [
{
"loc": [
{}
],
"msg": "string",
"type": "string"
}
]
}
detail
curl -X POST -H 'Authorization: <value>' -H 'Content-Type: application/json' -d '{"messages":["object"]}' https://{api_host}/api/v1/compatibility/openai/v1/assistants/{agent_id}/chat/completions