API Responses
Understanding API response formats and structures
The chat completion object
Represents a chat completion response returned by model, based on the provided input.
Response Fields
choices
Type: array
A list of chat completion choices. Can be more than one if n is greater than 1.
Show properties
index(integer): The index of this choice in the listmessage(object): The chat message generated by the modelrole(string): The role of the message author (e.g., "assistant")content(string): The content of the messagerefusal(string or null): The refusal message if the model refuses to answerannotations(array): Additional annotations on the message
logprobs(object or null): Log probability informationfinish_reason(string): The reason the model stopped generating (e.g., "stop", "length", "content_filter")
created
Type: integer
The Unix timestamp (in seconds) of when the chat completion was created.
id
Type: string
A unique identifier for the chat completion.
model
Type: string
The model used for the chat completion.
object
Type: string
The object type, which is always chat.completion.
service_tier
Type: string or null
Specifies the latency tier to use for processing the request. This parameter is relevant for customers subscribed to the scale tier service:
- If set to 'auto', and the Project is Scale tier enabled, the system will utilize scale tier credits until they are exhausted.
- If set to 'auto', and the Project is not Scale tier enabled, the request will be processed using the default service tier with a lower uptime SLA and no latency guarantee.
- If set to 'default', the request will be processed using the default service tier with a lower uptime SLA and no latency guarantee.
- If set to 'flex', the request will be processed with the Flex Processing service tier. Learn more.
- When not set, the default behavior is 'auto'.
When this parameter is set, the response body will include the service_tier utilized.
system_fingerprint
Type: string
This fingerprint represents the backend configuration that the model runs with.
Can be used in conjunction with the seed request parameter to understand when backend changes have been made that might impact determinism.
usage
Type: object
Usage statistics for the completion request.
Show properties
prompt_tokens(integer): Number of tokens in the promptcompletion_tokens(integer): Number of tokens in the generated completiontotal_tokens(integer): Total number of tokens used in the request (prompt + completion)prompt_tokens_details(object): Detailed breakdown of prompt tokenscached_tokens(integer): Number of tokens that were cachedaudio_tokens(integer): Number of audio tokens in the prompt
completion_tokens_details(object): Detailed breakdown of completion tokensreasoning_tokens(integer): Number of tokens used for reasoningaudio_tokens(integer): Number of audio tokens in the completionaccepted_prediction_tokens(integer): Number of accepted prediction tokensrejected_prediction_tokens(integer): Number of rejected prediction tokens
Example Response
{
"id": "chatcmpl-8YMnDbsifkBeAs814beb0dFOJdPeG",
"object": "chat.completion",
"created": 1741570285,
"model": "gpt-4o-2024-08-06",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "The image shows a wooden boardwalk extending through a forest.",
"refusal": null,
"annotations": []
},
"logprobs": null,
"finish_reason": "stop",
"feedback_token": "fb_8YMnDbsifkBeAs814beb0dFOJdPeG_0",
"duration_seconds": 2.34,
"cost_usd": 0.001234
}
],
"usage": {
"prompt_tokens": 1117,
"completion_tokens": 46,
"total_tokens": 1163,
"prompt_tokens_details": {
"cached_tokens": 0,
"audio_tokens": 0
},
"completion_tokens_details": {
"reasoning_tokens": 0,
"audio_tokens": 0,
"accepted_prediction_tokens": 0,
"rejected_prediction_tokens": 0
}
},
"service_tier": "default",
"system_fingerprint": "fp_fc7f1d7035"
}Extra fields in completion response
In the completion response, the choice object includes the following fields that are not defined in the OpenAI API:
- feedback_token: the feedback token for the completion, as described in the User Feedback documentation.
- duration_seconds: the duration of the completion in seconds
- cost_usd: the cost of the generation in USD for that specific choice
Streaming Responses
When stream: true is set, the response will be sent as data-only server-sent events. See the Streaming guide for more details on handling streaming responses.
How is this guide?