Search code examples
node.jsexpressnode-streams

Why is streaming a response so much slower than sending a regular response in Node.js / Express?


I'm using a Node.js / Express server to query a Postgres database and send the results to the browser as a CSV file. The result set can get quite large (e.g. 50+ MB), so I thought it prudent to stream the results straight from the DB all the way to the browser like so:

const QueryStream = require('pg-query-stream');
const { Transform } = require('json2csv');

const pool = require('./pool-instance');

// ...some request handling code...

const client = await pool.connect();
const stream = client.query(new QueryStream(q.text, q.values));

stream.on('end', () => {
  client.release();
});

const json2csv = new Transform({}, {objectMode: true});
res.set('Content-Type', 'text/csv');
res.set('Content-Disposition', 'attachment;filename=export.csv');

// pipe the query results to the Express response object. 
stream.pipe(json2csv).pipe(res);

This worked well when tested locally, but when I tested it over the network on a small server it took over 20 seconds to stream a 1.3 MB file. So, I tried doing things in a more traditional way:

// Just load the full query results in memory
const results = await pool.query(q);

// Create the full csv text string from the query results
const csv = await parseAsync(results.rows);

res.set('Content-Type', 'text/csv');
res.set('Content-Disposition', 'attachment;filename=export.csv');

res.send(csv);

This took only 2 seconds for the same file.

Why is this? Why is the streaming approach so much slower?


Solution

  • I faced the same issue and it seems that the reason is not related to node-pg but Node Streams workflow. The problem is described here : NodeJS Copying File over a stream is very slow

    In the Node.js documentation about stream buffering, it says:

    Both Writable and Readable streams will store data in an internal buffer that can be retrieved using writable.writableBuffer or readable.readableBuffer, respectively.

    The amount of data potentially buffered depends on the highWaterMark option passed into the stream's constructor. For normal streams, the highWaterMark option specifies a total number of bytes. For streams operating in object mode, the highWaterMark specifies a total number of objects....

    A key goal of the stream API, particularly the stream.pipe() method, is to limit the buffering of data to acceptable levels such that sources and destinations of differing speeds will not overwhelm the available memory.

    According to the QueryStream constructor def here : https://github.com/brianc/node-postgres/blob/master/packages/pg-query-stream/src/index.ts#L24

    you can override the highWaterMark settings to a higher value (default to 100). Here I choosed 1000 but in some case, you might want to increase or decrease this value. I suggest to be careful with it, as it could leads to OOM problems if you are running this in production.

    new QueryStream(query, [], {highWaterMark: 1000});
    

    Or in your case :

    const stream = client.query(new QueryStream(q.text, q.values, {highWaterMark: 1000}));
    

    Also, you should make sure that no others streams piped with it has a lower highWaterMark value that could lead to slow the process.

    For me this improved the speed to be as quick as direct download from the db.


    Also, I found that under low performance CPU, my stream was still too slow. In my case the issue was the Express' compression() compressing the response with gzip. I set up the compression on the reverse proxy side (traefik) and everything was working well.