Search code examples
javascriptnode.jsfileiostream

What is the default number of bytes returned from read method of readable stream class?


The following snippet creates a readable stream from a file and then listens to the readable event to start receiving data from the stream.

    const rstream = fs.createReadStream(this.getFileName(), {
        encoding: "utf8"
    });

    /**
     * Reference: https://nodejs.org/api/stream.html#stream_readable_streams
     *
     * Adding a 'readable' event handler automatically make the stream to stop flowing,
     * and the data to be consumed via readable.read().
     */

    rstream.on("readable", () => {
        let data;
        while(data = rstream.read()) {
            console.log(data, " *");
        }
    });

The read function, accepts a size argument which is the maximum number of bytes to read from the read stream. What is the default number of bytes returned from the stream or how does it work? For example, if my file has newline-separated tokens, will it always return the tokens from the new line or could be it a partial result, where some tokens are from the new line and the last result has 2 just characters from the new line?

Update:

I also read about an option highWaterMark which possibly defines the chunk size for the buffered stream. How does this work. I tried the following:

    const rstream = fs.createReadStream(this.getFileName(), {
        encoding: "utf8",
        highWaterMark: 64 * 1024
    });

Does this mean that a chunk size will be no less than 64 * 1024 bytes? When I tried reading file the above configuration for highWaterMark, the program read 8 bytes the first time and some 11 bytes the next time with file size of 19 bytes. Should not it have read the complete 19 bytes at once?


Solution

  • will it always return the tokens from the new line or could be it a partial result

    No, it will not always return a full line. You have to be prepared to get a partial line. If you want line by line results, you can use the Readline interface and it will handle partial result buffering and only tell you when it has a whole line.

    What is the default number of bytes returned from the stream or how does it work?

    A stream uses an internal buffer (which you have some control over). The call to .read() is non-blocking. It will return as many bytes as it has in its buffer, up to the amount you asked for. But, if the buffer doesn't have very many bytes in it and particular as you get to reading towards the end of the buffer, you absolutely can get partial results before the stream has received the next set of bytes from the file.

    So, there is no automatic "default" value for how many bytes will be ready upon the first read. It depends upon a whole bunch of timing considerations including how much time has elapsed since you opened the stream and when you are reading and how fast your drive is and how much contention there is for both CPU and I/O on your system. If you don't read for a little while, the stream should fill its internal buffer. If you read immediately, it might have not yet put any or much into the buffer yet.

    Does this mean that a chunk size will be no less than 64 * 1024 bytes?

    No, it means that a chunk size will be no larger than that. For a readable stream, the highWaterMark determines the maximum amount of data that the stream will buffer ahead of your actual read() calls. Once, the stream fills up that buffer, it stops reading automatically until you read some of that data out of the internal buffer.