Search code examples
regexnode.jsstreampiping

Is it possible to end Node stream chunks at a certain character?


I have a fairly large XML file that I'm streaming to a function, like this:

var stream = fs.createReadStream(__dirname + '/File.xml').pipe(myfunction);

The contents of the function that's being streamed to aren't really important, other than it involves splitting the stream into the strings that I want and running decodeURIComponent on them. The problem I'm having is that some of the chunks end partway through encoded strings:

01 %E5%8A%87%E4%BC%B4%E7%89%%9E%8B1%E2%98%86%E6%A5%B5%E2%98%85%E6.csv
02 %E3%83%AA%E3%82%B9%E3%82%BC%B7%E5%8C%96%E5%9E%8B2%E2%98%86%E6.csv
03 %E6%97%A5%E5%8B3%E2%98%86%E6%A5%B5%E2%98%85%E6%9C%8D.csv
04 %E6%9C%8D%E7%9D%B1%9A%E5%9E%8B4%E2%98%86%E6%A5%B5%E2%98%85%E6%9C%8D.csv
05 %E5%90%8D%E4%BB%98%E6%89%87%E5%

As you can see, the final filename is cut off partway through one of the encoded characters.

Is it possible to force stream chunks to end at certain character or regex groups, ex. after the .csv? I haven't found a solution for this elsewhere, which leads me to believe that I'm taking the wrong approach.

On the other hand, I could simply write the output of each buffer to a hugeString and then operate on that, but I feel as though that isn't really in keeping with the other advantages afforded by Node's streams.


Solution

  • You can't force the native stream chunks to end at any given spot (they are what they are from the stream reading code), but you can create your own code that reads the stream and buffers until it gets a whole piece and then you can trigger your own event or callback to announce that you have a whole piece or you can pipe the stream into a transform stream that breaks it into lines for you.

    Here's a good article on how the transform stream works for line breaking (which seems to be basically the same concept you are asking about).