Search code examples

Read bincode serialized structs in parallel from file

I am currently using serde-jsonlines to serialize a large number of identical structs to file. I am then using rayon to read data out of this file in parallel using par_bridge:

let mut reader = JsonLinesReader::new(input_file);    
let results: Vec<ResultStruct> = db_json_reader
                .map(|my_struct| {
                    // do processing of my struct and return result

This works since JsonLinesReader returns an iterator over the lines of the input file. I'd like to use bincode to encode my structs instead as this results in a smaller file on disk. I have the following playground that works as expected:

use bincode;
use serde::{Deserialize, Serialize};
use std::fs::File;
use std::io::{BufWriter, Write};

#[derive(Debug, Deserialize, Serialize)]
struct MyStruct {
    name: String,
    value: Vec<u64>,

pub fn playground() {
    let s1 = MyStruct {
        name: "Hello".to_string(),
        value: vec![1, 2, 3],
    let s2 = MyStruct {
        name: "World!".to_string(),
        value: vec![3, 4, 5, 6],

    let out_file = File::create("test.bin").expect("Unable to create file");
    let mut writer = BufWriter::new(out_file);

    let s1_encoded: Vec<u8> = bincode::serialize(&s1).unwrap();
    writer.write_all(&s1_encoded).expect("Unable to write data");

    let s2_encoded: Vec<u8> = bincode::serialize(&s2).unwrap();
    writer.write_all(&s2_encoded).expect("Unable to write data");


    let mut in_file = File::open("test.bin").expect("Unable to open file");
    let s1_decoded: MyStruct =
        bincode::deserialize_from(&mut in_file).expect("Unable to read data");
    let s2_decoded: MyStruct =
        bincode::deserialize_from(&mut in_file).expect("Unable to read data");

    println!("s1_decoded: {:?}", s1_decoded);
    println!("s2_decoded: {:?}", s2_decoded);

Is it possible to read the structs out in parallel in a manner similar to what I am currently doing with serde-jsonlines? I imagine this might not be possible since each struct is not terminated by a newline and thus there is no sensible way to chunk up the input stream to allow processing by multiple threads.


  • Note that the serde-jsonlines code uses a single thread to parse the JSON, and only goes multithread for the map processing. The same thing can be done with bincode:

    let results: Vec<ResultStruct> = iter::from_fn (
            move || bincode::deserialize_from (&mut in_file).ok())
        .map (|my_struct| {
            // do processing of my struct and return result

    (I also removed the redundant call to into_par_iter since par_bridge already creates a parallel iterator).