javascript express stream vercel server-sent-events

SSE works locally, but not when deployed to Vercel

I have a proxy server which makes a request to OpenAI which returns a Readable Stream object. The proxy server takes this readable stream object and pipes the events back to the client.

My code works as intended when deployed locally, but once deployed to Vercel it does not.

When deployed locally:

The SSE connection is established
The data is received in many small chunks
The 'Transfer-Encoding' header is present with a value of 'chunked'

When deployed on Vercel:

The SSE connection is not established, instead it is treated like a regular REST API call.
The data is received in one big chunk
The 'Transfer-Encoding' header is NOT present, instead there is a 'Content-Length' header which is unexpected.

app.post('/completions', (req, res) => {
    res.statusCode = 200;
    res.setHeader('Content-Type', 'text/event-stream');
    res.setHeader('Transfer-Encoding', 'chunked');
    res.setHeader('Cache-Control', 'no-cache');
    res.setHeader('X-Accel-Buffering', 'no');
    res.setHeader('Connection', 'keep-alive');

    const headers = {
        'Authorization': `Bearer MY_AUTH_TOKEN`
    };
    const body = {
        'messages': []
    };

    axios
    .post(
        'https://api.openai.com/v1/chat/completions',
        body,
        {
            headers: headers,
            responseType: 'stream'
        }
    )
    .then((open_ai_response) => {
        open_ai_response.data.pipe(res);
    })
    .catch((err) => {
        res.status(500).send(err);
    })
});

Solution

TL;DR: You need to use the Edge runtime

It looks like you might be using a Next.js custom server. According to that link (after the first paragraph):

Note: A custom server cannot be deployed on Vercel.

This is due to the limitations of Vercel (and serverless platforms in general), specifically the serverless function execution timeout. Prior to the introduction of Vercel's Edge Runtime, it was not possible to stream data to the client. Fortunately, the Edge Runtime does support Server-Sent Events, and the documentation for streaming events even points to an example using the GPT3 API.

You might also read the page about Next.js Runtimes.

"Both runtimes can also support streaming depending on your deployment infrastructure."

On Vercel, the Node runtime is always deployed as serverless, thus not compatible with Server-Sent Events.

Finally, this discussion might have useful information (although the method used here is for hosting providers supporting serverful Node runtimes.