Search code examples
javascriptcsvpapaparse

Get just header from remote csv file using papa parse


I need to extract just the header from a remote csv file.

My current method is as follows:

Papa parse has a method to stream data and look at each row individually which is great, and I can terminate the stream using parser.abort() to prevent it going any further after the first row, this looks as follows:

Papa.parse(csv_file_and_path,{header:true, worker:true, 
    download: true,
    step: function(row, parser) 
    {
        //DO MY STUFF HERE
        parser.abort();
    }
});

This works fine, but because I am using a remote file, it has to download the data in order to read it. Even though the code releases control back to the browser after the first line has been parsed, the download continues long after the parsing has found the first row and given me the information I need, particularly for large files where the download can continue for a long time after I've got what I need.

Is there a more efficient way of doing this? Can I prevent papa parse from downloading the whole file?

I have tried using

Papa.parse(csv_file,{header:true,
download: true,
preview:1,
complete: function(results){
    //DO MY STUFF HERE
}
});

But this does the same thing, it downloads the entire file, but as with the first approach gives back control to the browser after the first line is parsed.


Solution

  • The solution I came up with is very similar to my original question, the difference being that I abort, complete and clear the memory.

    Using the following method, only a single chunk of the file is downloaded, massively reducing bandwidth overhead for a large file as there is no downloading continuing after the first line is parsed.

    Papa.parse(csv_file,{header:true,
        download: true,
        step: function(results, parser) {
    
            //DO MY THING HERE
    
            parser.abort(); 
            results=null;   //Attempting to clear the results from memory
            delete results; //Attempting to clear the results from memory
    
        }, complete: function(results){
    
            results=null;   //Attempting to clear the results from memory
            delete results; //Attempting to clear the results from memory
    
        }
    });