Streaming

Learn how to stream model responses from the WorkflowAI API using server-sent events.

By default, when you make a request to the WorkflowAI API, the model generates the entire output before returning it in a single HTTP response. For longer outputs, this can mean waiting until the full response is ready. With streaming, you can start receiving and processing the model's output as it is generated, allowing you to display or use partial results in real time.

Enable streaming

Streaming is available exactly like in the OpenAI API. If you are already streaming from the OpenAI API, no code change is required.

To stream completions, set stream=True in the request.

The response is sent back incrementally in chunks with an event stream. You can iterate over the event stream with a for loop, like this:

from openai import OpenAI
client = OpenAI(
    base_url="https://run.workflowai.com/v1",
    api_key="wai--***",
)

stream = client.chat.completions.create(
    model="gpt-4.1",
    messages=[
        {
            "role": "user",
            "content": "Say 'double bubble bath' ten times fast.",
        },
    ],
    stream=True, 
)

for chunk in stream:
    print(chunk)
    print(chunk.choices[0].delta)
    print("****************")

Read the responses

When you stream a chat completion, the responses have a delta field rather than a message field. The delta field can hold a role token, content token, or nothing.

{ role: 'assistant', content: '', refusal: null }
****************
{ content: 'Why' }
****************
{ content: " don't" }
****************
{ content: ' scientists' }
****************
{ content: ' trust' }
****************
{ content: ' atoms' }
****************
{ content: '?\n\n' }
****************
{ content: 'Because' }
****************
{ content: ' they' }
****************
{ content: ' make' }
****************
{ content: ' up' }
****************
{ content: ' everything' }
****************
{ content: '!' }
****************
{}
****************

To stream only the text response of your chat completion, your code would look like this:

from openai import OpenAI
client = OpenAI(
    base_url="https://run.workflowai.com/v1",
    api_key="wai--***",
)

stream = client.chat.completions.create(
    model="gpt-4.1",
    messages=[
        {
            "role": "user",
            "content": "Say 'double bubble bath' ten times fast.",
        },
    ],
    stream=True,
)

for chunk in stream:
    if chunk.choices[0].delta.content is not None:
        print(chunk.choices[0].delta.content, end="")

Streaming complete JSON outputs

When using JSON outputs, it can be quite complicated to piece together the deltas in order to get a usable partial JSON object. The final payload is split into multiple chunks that only amount to a valid JSON object at the very end.

For example, when trying to extract a user with a name, age and email in JSON form:

{
  "name": "John Doe",
  "age": 30,
  "email": "john.doe@example.com"
}

The chunks can have the following content deltas:

{"name
":"
"John
Doe
",
...

By setting the stream_options.valid_json_chunks parameter to true, WorkflowAI can aggregate the deltas into valid partial JSON objects, transforming the above deltas into:

# Chunks that do not represent an update of the JSON object are ignored
# Each chunk is a valid JSON object
{"name": "John"}
{"name": "John Doe"}
...

streamer = await openai_client.chat.completions.create(
    model="gpt-4o",
    metadata={"agent_id": "my-agent"},
    messages=...,
    stream=True,
    # The following can break typing since `valid_json_chunks` is not supported by OpenAI,
    # one solution is to cast to ChatCompletionStreamOptionsParam
    stream_options={"valid_json_chunks": True},
    response_format={"type": "json_object"}, # or "json_schema"
)

async for chunk in streamer: # Every content delta is a valid JSON object
    print(json.loads(chunk.choices[0].delta.content))

const completion = await openai.chat.completions.create({
    model: 'gpt-4o',
    metadata: { agent_id: 'my-agent' },
    messages: ...,
    response_format: { type: 'json_object' }, // or "json_schema"
    stream: true,
    stream_options: {
      //@ts-expect-error - valid_json_chunks is not supported by OpenAI
      valid_json_chunks: true,
    },
})
for await (const chunk of completion) {
    if (chunk.choices[0].delta.content) {
        console.log(JSON.parse(chunk.choices[0].delta.content));
    }
}

curl -X POST https://run.workflowai.com/v1/chat/completions \
  -H "Authorization: Bearer wai--***" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [],
    "stream": true,
    "stream_options": {
      "valid_json_chunks": true
    },
    "response_format": {
      "type": "json_object"
    },
    "metadata": {
      "agent_id": "my-agent"
    }
}'

Streaming

Enable streaming

Read the responses

Streaming complete JSON outputs

On this page