Search code examples
node.jsfirebasegoogle-bigquerygoogle-cloud-functionsnodejs-stream

Firebase function Node.js transform stream


I'm creating a Firebase HTTP Function that makes a BigQuery query and returns a modified version of the query results. The query potentially returns millions of rows, so I cannot store the entire query result in memory before responding to the HTTP client. I am trying to use Node.js streams, and since I need to modify the results before sending them to the client, I am trying to use a transform stream. However, when I try to pipe the query stream through my transform stream, the Firebase Function crashes with the following error message: finished with status: 'response error'.

My minimal reproducible example is as follows. I am using a buffer, because I don't want to process a single row (chunk) at a time, since I need to make asynchronous network calls to transform the data.

return new Promise((resolve, reject) => {
    const buffer = new Array(5000)
    let bufferIndex = 0
    const [job] = await bigQuery.createQueryJob(options)
    const bqStream = job.getQueryResultsStream()

    const transformer = new Transform({
        writableObjectMode: true,
        readableObjectMode: false,
        transform(chunk, enc, callback) {
            buffer[bufferIndex] = chunk
            if (bufferIndex < buffer.length - 1) {
                bufferIndex++
            }
            else {
                this.push(JSON.stringify(buffer).slice(1, -1)) // Transformation should happen here.
                bufferIndex = 0
            }
            callback()
        },
        flush(callback) {
            if (bufferIndex > 0) {
                this.push(JSON.stringify(buffer.slice(0, bufferIndex)).slice(1, -1))
            }
            this.push("]")
            callback()
        },
    })

    bqStream
        .pipe(transform)
        .pipe(response)

    bqStream.on("end", () => {
        resolve()
    })
}

Solution

  • I cannot store the entire query result in memory before responding to the HTTP client

    Unfortunately, when using Cloud Functions, this is precisely what must happen.

    There is a documented limit of 10MB for the response payload, and that is effectively stored in memory as your code continues to write to the response. Streaming of requests and responses is not supported.

    One alternative is to write your response to an object in Cloud Storage, then send a link or reference to that file to the client so it can read the response fully from that object.

    If you need to send a large streamed response, Cloud Functions is not a good choice. Neither is Cloud Run, which is similarly limited. You will need to look into other solutions that allow direct socket access, such as Compute Engine.