Create chat completion (non-streaming)

POST

/v1/chat/completions

Given a hint, the model will return one or more predicted completions and can also return the probability of the surrogate marker for each position.
Complete creation for the provided prompts and parameters

Request

Header Params

Content-Type

string

required

Example:

application/json

string

required

Example:

application/json

Authorization

string

optional

Example:

Bearer {{YOUR_API_KEY}}

Body Params application/json

model

string

required

The ID of the model to use. For more information about which models can be used with the Chat API, see the model endpoint compatibility table.

messages

array [object {2}]

required

List of messages contained in the conversation so far. Python code examples.

role

string

optional

content

string

optional

temperature

integer

optional

What sampling temperature to use, between 0 and 2. A higher value (like 0.8) will make the output more random, while a lower value (like 0.2) will make the output more focused and deterministic. We generally recommend changing this or top_p but not both.

top_p

integer

optional

An alternative to temperature sampling, called kernel sampling, where the model considers the results of markers with top_p probability mass. So 0.1 means only the tokens that make up the top 10% of probability mass are considered. We generally recommend changing this or temperature but not both.

integer

optional

Default is 1
How many chat completion choices are generated for each input message.

stream

boolean

optional

Defaults to false If set, partial message deltas will be sent like in ChatGPT. Markers will be sent in the form of data-only server-sent events when available, and on a data: [DONE] message terminating the stream. Python code examples.

stop

string

optional

Defaults to null for up to 4 sequences and the API will stop generating further tokens.

max_tokens

integer

optional

Default is inf
The maximum number of tokens generated in chat completion.

The total length of input tokens and generated tokens is limited by the context length of the model. Python code example for calculating tokens.

presence_penalty

number

optional

A number between -2.0 and 2.0. Positive values penalize new tokens based on whether they have appeared in the text so far, thus increasing the likelihood that the model is talking about new topics. See more information on frequency and presence penalties.

frequency_penalty

number

optional

Defaults to 0 -a number between 2.0 and 2.0. Positive values penalize new tokens based on how frequently the text currently exists, reducing the likelihood that the model will repeat the same line. More information on frequency and presence penalties.

logit_bias

null

optional

Modifies the likelihood that the specified tag appears in completion.

Accepts a JSON object that maps tags (tag IDs specified by the tagger) to associated bias values (-100 to 100). Mathematically speaking, the bias is added to the logit generated by the model before sampling the model. The exact effect varies between models, but values between -1 and 1 should reduce or increase the selection likelihood of the relevant marker; values such as -100 or 100 should result in disabled or exclusive selection of the relevant marker.

user

string

optional

A unique identifier that represents your end user and helps OpenAI monitor and detect abuse. Learn more.

response_format

object

optional

An object that specifies the format in which the model must be output. Setting { "type": "json_object" } enables JSON mode, which ensures that messages generated by the model are valid JSON. Important: When using JSON schema, you must also instruct the model to generate JSON via a system or user message. If you don't do this, the model may generate an endless stream of blanks until the token limit is reached, resulting in increased latency and the appearance of a "stuck" request. Also note that if finish_reason="length", the message content may be partially cut off, indicating that the generation exceeded max_tokens or the conversation exceeded the maximum context length. display properties

seen

integer

optional

This feature is in beta. If specified, our system will do its best to sample deterministically so that repeated requests with the same seed and parameters should return the same results. Determinism is not guaranteed and you should refer to the system_fingerprint response parameter to monitor the backend for changes.

tools

array[string]

required

A list of a set of tools that the model can call. Currently, only functions that are tools are supported. Use this feature to provide a list of functions for which the model can generate JSON input.

tool_choice

object

required

Controls which function (if any) the model calls. none means that the model will not call the function, but generate a message. auto means that the model can choose between generating messages and calling functions. Force the model to call the function via {"type": "function", "function": {"name": "my_function"}}. If no function exists, the default is none. If a function exists, it defaults to auto. Show possible types

Example

{
    "model": "gpt-4o",
    "messages": [
      {
        "role": "user",
        "content": "Hello!"
      }
    ],
    "max_tokens":1000
  }

Request samples

Shell

JavaScript

Java

Swift

PHP

Python

HTTP

Objective-C

Ruby

OCaml

Dart

curl --location --request POST '/v1/chat/completions' \
--header 'Accept: application/json' \
--header 'Authorization: Bearer {{YOUR_API_KEY}}' \
--header 'Content-Type: application/json' \
--data-raw '{
    "model": "gpt-4o",
    "messages": [
      {
        "role": "user",
        "content": "Hello!"
      }
    ],
    "max_tokens":1000
  }'

Responses

🟢200OK

application/json

Body

string

required

object

string

required

created

integer

required

choices

array [object {3}]

required

index

integer

optional

message

object

optional

finish_reason

string

optional

usage

object

required

prompt_tokens

integer

required

completion_tokens

integer

required

total_tokens

integer

required

Example

{
    "id": "chatcmpl-123",
    "object": "chat.completion",
    "created": 1677652288,
    "choices": [
        {
            "index": 0,
            "message": {
                "role": "assistant",
                "content": "\n\nHello there, how may I assist you today?"
            },
            "finish_reason": "stop"
        }
    ],
    "usage": {
        "prompt_tokens": 9,
        "completion_tokens": 12,
        "total_tokens": 21
    }
}

Modified at 2025-02-19 11:14:26

Create chat completion (streaming)

Create chat image recognition (streaming)