I don't know if this is directly related to Bun, or the way I'm treating the files.
I need to generate a hash for a lot of files on a folder. Enumerate and list files was a breeze, but when I tried to generate a hash for each one my RAM "exploded".
This is simplified code for understanding the problem.
let files: string[] = ["path to file1", "path to file2"];
async function hashFile(file: string) {
let buffer = await Bun.file(file).arrayBuffer();
return Bun.hash.crc32(buffer);
}
let hashes: number[] = [];
files.forEach(async (f) => {
let hash = await hashFile(f);
console.log(
"Memory usage: ",
Math.trunc(process.memoryUsage.rss() / 1024 / 1024),
"MB"
);
hashes. Push(hash);
});
Thanks in advance.
The combination of async functions and the forEach method will not wait for the promises to resolve, potentially causing all files to be loaded simultaneously. That said try to limit Concurrency Instead of using forEach
, which tries to process all files simultaneously, use a for...of
loop. This way, you're processing files one by one, and only one file's content will be in memory at any given time.
For example:
let files: string[] = ["path to file1", "path to file2"];
async function hashFile(file: string) {
let buffer = await Bun.file(file).arrayBuffer();
const hash = Bun.hash.crc32(buffer);
buffer = null; // Explicitly set to null to help with garbage collection
return hash;
}
let hashes: number[] = [];
// Use a for...of loop to process files sequentially
for (const f of files) {
let hash = await hashFile(f);
console.log(
"Memory usage: ",
Math.trunc(process.memoryUsage.rss() / 1024 / 1024),
"MB"
);
hashes.push(hash);
}
Also, In relation to Garbage Collection, Node.js's V8 engine tries to manage memory efficiently, sometimes it's beneficial to give it hints. Although you can't force garbage collection, if you're noticing consistent memory issues, there are ways to manually run garbage collection using the --expose-gc flag with Node.js. This isn't usually recommended for production since it can degrade performance, but it can be useful for debugging or specific scripts with known memory issues.
You also may want to try Explicit Buffer Nullification. So, after using the buffer, set it to null to hint the garbage collector that it's okay to reclaim the memory (though there's no guarantee that it will do so immediately).
If you still can't seem to work with a large number of files you might want to process them with a controlled concurrency using a 3rd party utility library like Bluebird or P-limit beneficial.
Heres an example of how you may Implement you code using p-limit
import pLimit from 'p-limit';
let files: string[] = ["path to file1", "path to file2"];
async function hashFile(file: string) {
let buffer = await Bun.file(file).arrayBuffer();
const result = Bun.hash.crc32(buffer);
buffer = null;
return result;
}
let hashes: number[] = [];
// Limit concurrency to, say, 5 files at a time
const limit = pLimit(5);
const tasks = files.map(f => {
return limit(async () => {
let hash = await hashFile(f);
console.log(
"Memory usage: ",
Math.trunc(process.memoryUsage.rss() / 1024 / 1024),
"MB"
);
return hash;
});
});
hashes = await Promise.all(tasks);
I hope this helps you.
Happy coding =)