Search code examples
javascriptnode.jscsvwebpacknode-csv-parse

Streaming and parsing a CSV file in the browser (node/webpack)


A while back, I put together a node project that was designed to run in the browser, and one of the main things it does is parse CSV files it streams from a server, and operates on them in chunks. When I initially put this together, IE11 was a concern, and through an unhealthy amount of caffeine and probably some sacrifices to various elder gods, I managed to get something mostly working (obviously not with actual streaming, but close enough). Now, I'm in the process of updating dependencies and reworking things since IE11 is no longer a concern. However, between Webpack > 5 and a few other packages having breaking changes, I'm struggling to get CSV parsing to behave. The short version of what I used to do was:

import { ReadableWebToNodeStream } from "readable-web-to-node-stream";
var csv = require("csv-parse/lib/es5")

[...]

var response = fetch(csv_url);
new ReadableWebToNodeStream(response.body).pipe(csv())
  .on('data', async (row) => {
    // Deals with the data here
  }).on('end', async () => {
    // Finishes up
  });

Which worked out. The problem is that with changes to webpack (I assume), that ReadableWebToNodeStream library no longer works the way I expected, and I'd rather just do it the right way now that IE11 is no longer forcing me to do horrible things. I still have to do some transforms on the incoming data, so I'd prefer to keep the input as a stream so that I don't consume a huge amount of memory (these CSV files have the potential to be huge), but I'm struggling to find a way to interact with web ReadableStreams using csv-parse.

So I guess the advice I'd like is:

  1. Is there a way to parse a non-node ReadableStream using csv-parse
  2. Is there a way to configure webpack to have a stream behave like a node-style stream, despite running in the browser?

Solution

  • So, this issue stemmed from the update to webpack 5+ which caused node core modules to no longer be polyfilled. I mistakenly assumed that polyfilling fetch + streams would be enough, but I needed additional polyfills (in my case, process was the culprit). This question has more info on what changed in the update and what needs to be polyfilled. With everything polyfilled I was able to pipe the CSVs I was getting through csv-parse as before.