I understand that nodejs uses libuv library for I/O tasks. By default libuv uses four threads for handling I/O tasks. I am trying to understand the behavior of libuv when more than four I/O tasks are scheduled. Does a thread a wait until it finishes reading assigned file before reading another file or does it switch between many unfinished files.
Below is code that logs "data events" and "end events" from multiple read streams
const fs = require('node:fs');
const path = require('node:path');
function getFilePaths(directory){
let files = fs.readdirSync(directory)
return files
.map((file) => path.join(directory, file))
.filter((filePath) => fs.statSync(filePath).isFile())
}
const directory = 'D:\\test-folder';
const files = getFilePaths(directory)
const streams = []
files.forEach((file, index)=>{
streams.push([index, fs.createReadStream(file)])
})
streams.forEach(([index, stream])=>{
stream.on('data', () => {
console.log(`Data: Stream ${index}`);
});
stream.on('end', () => {
console.log(`End: Stream ${index}`);
});
stream.on('error', (err) => {
console.error(`Error: Stream ${index}`);
});
})
In the output I expected at least one of the first four files to be fully read before any data chunk is received for the remaining files. Instead it appears the libuv threads don't wait until a file is fully read before starting reading another file.
Is this the expected behavior?
Yes, this is the expected behaviour.
It doesn't wait until a file is fully read before starting to read another file. Node.js uses chunked file read operations. It reads 64k at a time (by default -- note that you can change the chunk size for a stream in node.js by changing the highWaterMark
as described in this answer) and the read of each chunk is dispatched to a thread in the thread pool. The thread that does the read is not locked to a particular file. It handles pending tasks according to libuv's scheduling logic. Even when reads for one file are in flight, the thread pool can handle read requests for other files.
Don't assume that threads get monopolized by a single file until they are completely read. File reads are broken into chunks internally. And Node.js and libuv can handle the IO for many files this way quite efficiently, even with only 4 threads.