Search code examples
node.jscsvparsingpapaparse

Parse Remote CSV File using Nodejs / Papa Parse?


I am currently working on parsing a remote csv product feed from a Node app and would like to use Papa Parse to do that (as I have had success with it in the browser in the past).

Papa Parse Github: https://github.com/mholt/PapaParse

My initial attempts and web searching haven't turned up exactly how this would be done. The Papa readme says that Papa Parse is now compatible with Node and as such Baby Parse (which used to serve some of the Node parsing functionality) has been depreciated.

Here's a link to the Node section of the docs for anyone stumbling on this issue in the future: https://github.com/mholt/PapaParse#papa-parse-for-node

From that doc paragraph it looks like Papa Parse in Node can parse a readable stream instead of a File. My question is;

Is there any way to utilize Readable Streams functionality to use Papa to download / parse a remote CSV in Node some what similar to how Papa in the browser uses XMLHttpRequest to accomplish that same goal?

For Future Visibility For those searching on the topic (and to avoid repeating a similar question) attempting to utilize the remote file parsing functionality described here: http://papaparse.com/docs#remote-files will result in the following error in your console:

"Unhandled rejection ReferenceError: XMLHttpRequest is not defined"

I have opened an issue on the official repository and will update this Question as I learn more about the problems that need to be solved.


Solution

  • Actually you could use a lightweight stream transformation library called scramjet - parsing CSV straight from http stream is one of my main examples. It also uses PapaParse to parse CSVs.

    All you wrote above, with any transforms in between, can be done in just couple lines:

    const {StringStream} = require("scramjet");
    const request = require("request");
    
    request.get("https://srv.example.com/main.csv")   // fetch csv
        .pipe(new StringStream())                       // pass to stream
        .CSVParse()                                   // parse into objects
        .consume(object => console.log("Row:", object))  // do whatever you like with the objects
        .then(() => console.log("all done"))
    

    In your own example you're saving the file to disk, which is not necessary even with PapaParse.