Search code examples

Load large json file (100GB+) in rust

I am looking to load a large json file over 100GB+. The objects in this file aren't static and are almost never the same. I found this crate called nop-json but I was unable to get it to the work in the way that I want. This is my current solution but it feels a bit like cheating.

    let file = File::open("./file.json")?;
    let reader = BufReader::new(file);
    for line in reader.lines() {

I am reading the file like a text file and itterating that way. the problem is that with this solution I am reading it as a string and that it loads the entire file into memory.

I am new to rust so I am looking for some help on this problem. I have a succesful implementation in python and it works great but its too slow.


Thank you for the replies so far here is some more information:

My *.json file has 1 array containing milions of objects. example:

        "foo": "bar",
        "bar": "foor"
        "foo": "bar",
        "bar": "foor"
        "foo": "bar",
        "bar": "foor"
        "foo": "bar",
        "bar": "foor"


The problem with reading the file as a text file this way is that not every object is 1 line exactly. The amount of lines for an object is not the same.

Some possible soltuion might be to read a chunk of the file and then check where the json object ended via something like a pattern }, {. But this seems inaficiant.


  • First off, if you accept normal full JSON, your problem is really hard.

    So I assume the following:

    • Your file always starts with a [.
    • Then, an arbitrary number of valid JSON strings follow, separated by ,.
    • After the JSON strings there is another ].
    • Every single JSON string is small enough to be parsed and held in memory in its entirety.

    Meaning, we now have a bunch of streamable separate JSON objects that are wrapped by our own array representation.

    With that, we can utilize serde_json and a little bit of glue to parse the file value by value:

    use std::error::Error;
    use std::io::Read;
    use serde_json::{Deserializer, Value};
    const JSON_FILE: &[u8] = br#"[
            "foo": "bar",
            "bar": "foor"
            "foo": "bar",
            "bar": "foor"
            "foo": "bar",
            "bar": "foor"
            "foo": "bar",
            "bar": "foor"
    fn open_file() -> impl Read {
    fn take_json_value(input_stream: &mut dyn Read) -> Result<Value, Box<dyn Error>> {
            .ok_or("Expected a JSON value!")??)
    fn main() {
        // Is of type `impl Read`, and can only be read once.
        // (to reproduce the situation of reading a file)
        let mut input_stream = open_file();
        // Skip initial `[`
        let mut skipped = 0u8;
            .read_exact(std::slice::from_mut(&mut skipped))
        assert_eq!(skipped, b'[');
        loop {
            let value = take_json_value(&mut input_stream).unwrap();
            println!("- {}", value);
            // Skip `,` after the value
                .read_exact(std::slice::from_mut(&mut skipped))
            if skipped != b',' {
        // Verify that the ending `]` exists
        let mut leftover_data = vec![b'[', skipped];
        input_stream.read_to_end(&mut leftover_data).unwrap();
        serde_json::from_slice::<[u8; 0]>(&leftover_data).unwrap();
    - Object {"bar": String("foor"), "foo": String("bar")}
    - Object {"bar": String("foor"), "foo": String("bar")}
    - Object {"bar": String("foor"), "foo": String("bar")}
    - Object {"bar": String("foor"), "foo": String("bar")}