I often find myself reading a large JSON file (usually an array of objects) then manipulating each object and writing back to a new file.
To achieve this in Node (at least the reading the data portion) I usually do something like this using the stream-json module.
const fs = require('fs');
const StreamArray = require('stream-json/streamers/StreamArray');
const pipeline = fs.createReadStream('sample.json')
.pipe(StreamArray.withParser());
pipeline.on('data', data => {
//do something with each object in file
});
I've recently discovered Deno and would love to be able to do this workflow with Deno.
It looks like the readJSON method from the Standard Library reads the entire contents of the file into memory so I don't know if it would be a good fit for processing a large file.
Is there a way this can be done by streaming the data from the file using some of the lower level methods that are built into Deno?
Circling back on this now that Deno 1.0 is out and in case anyone else is interested in doing something like this. I was able to piece together a small class that works for my use case. It's not nearly as robust as something like the stream-json
package but it handles large JSON arrays just fine.
import { EventEmitter } from "https://deno.land/std/node/events.ts";
export class JSONStream extends EventEmitter {
private openBraceCount = 0;
private tempUint8Array: number[] = [];
private decoder = new TextDecoder();
constructor (private filepath: string) {
super();
this.stream();
}
async stream() {
console.time("Run Time");
let file = await Deno.open(this.filepath);
//creates iterator from reader, default buffer size is 32kb
for await (const buffer of Deno.iter(file)) {
for (let i = 0, len = buffer.length; i < len; i++) {
const uint8 = buffer[ i ];
//remove whitespace
if (uint8 === 10 || uint8 === 13 || uint8 === 32) continue;
//open brace
if (uint8 === 123) {
if (this.openBraceCount === 0) this.tempUint8Array = [];
this.openBraceCount++;
};
this.tempUint8Array.push(uint8);
//close brace
if (uint8 === 125) {
this.openBraceCount--;
if (this.openBraceCount === 0) {
const uint8Ary = new Uint8Array(this.tempUint8Array);
const jsonString = this.decoder.decode(uint8Ary);
const object = JSON.parse(jsonString);
this.emit('object', object);
}
};
};
}
file.close();
console.timeEnd("Run Time");
}
}
Example usage
const stream = new JSONStream('test.json');
stream.on('object', (object: any) => {
// do something with each object
});
Processing a ~4.8 MB json file with ~20,000 small objects in it
[
{
"id": 1,
"title": "in voluptate sit officia non nesciunt quis",
"urls": {
"main": "https://www.placeholder.com/600/1b9d08",
"thumbnail": "https://www.placeholder.com/150/1b9d08"
}
},
{
"id": 2,
"title": "error quasi sunt cupiditate voluptate ea odit beatae",
"urls": {
"main": "https://www.placeholder.com/600/1b9d08",
"thumbnail": "https://www.placeholder.com/150/1b9d08"
}
}
...
]
Took 127 ms.
❯ deno run -A parser.ts
Run Time: 127ms