Search code examples
jsonrustserde-jsonrayon

Parallel json deserialization fails with valid json


I want to deserialize json values in parallel using rayon. A valid json from the serde-json example fails when trying to deserialize inside par_iter, despite being parsed correctly without parallelization. This is the code:

use rayon::prelude::*; // 1.7.0
use serde_json::{Result, Value};

fn main() -> Result<()> {
    let data = r#"
        {
            "name": "John Doe",
            "age": 43,
            "phones": [
                "+44 1234567",
                "+44 2345678"
            ]
        }"#;
    let v: Value = serde_json::from_str(data)?;
    println!("Please call {} at the number {}", v["name"], v["phones"][0]);

    let mut batch = Vec::<String>::new();
    batch.push(data.to_string());
    batch.push(data.to_string());
    
    let _values = batch.par_iter()
        .for_each(|json: &String| {
            serde_json::from_str(json.as_str()).unwrap()
        });
        
    Ok(())
}

and this is the error

thread 'thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: Error("invalid type: map, expected unit", line: 2, column: 8)', src/main.rs:23:49

Link to the Playground.

IIRC, I've seen other par_iter examples that use unwrap inside. Is this not recommended? In my case, I want to do it because I need the program to panic if an invalid input comes in.


Solution

  • serde_json::from_str determines its output type automatically from the type of variable it gets written into. In your case, however, for_each doesn't expect a return value, so from_str attempt to deserialize it into a ().

    Use map().collect() together with a : Vec<Value> annotation to make this work:

    use rayon::prelude::*; // 1.7.0
    use serde_json::{Result, Value};
    
    fn main() -> Result<()> {
        let data = r#"
            {
                "name": "John Doe",
                "age": 43,
                "phones": [
                    "+44 1234567",
                    "+44 2345678"
                ]
            }"#;
        let v: Value = serde_json::from_str(data)?;
        println!("Please call {} at the number {}", v["name"], v["phones"][0]);
    
        let mut batch = Vec::<String>::new();
        batch.push(data.to_string());
        batch.push(data.to_string());
    
        let values: Vec<Value> = batch
            .par_iter()
            .map(|json: &String| serde_json::from_str(json.as_str()).unwrap())
            .collect();
    
        println!("Values:\n{:#?}", values);
    
        Ok(())
    }
    
    Please call "John Doe" at the number "+44 1234567"
    Values:
    [
        Object {
            "age": Number(43),
            "name": String("John Doe"),
            "phones": Array [
                String("+44 1234567"),
                String("+44 2345678"),
            ],
        },
        Object {
            "age": Number(43),
            "name": String("John Doe"),
            "phones": Array [
                String("+44 1234567"),
                String("+44 2345678"),
            ],
        },
    ]
    

    Although honestly, it's a little weird to use serde::Value; usually people deserialize directly into a struct:

    use rayon::prelude::*;
    use serde::{Deserialize, Serialize};
    use serde_json::Result;
    
    #[derive(Debug, Serialize, Deserialize)]
    struct Entry {
        name: String,
        age: u32,
        phones: Vec<String>,
    }
    
    fn main() -> Result<()> {
        let data = r#"
            {
                "name": "John Doe",
                "age": 43,
                "phones": [
                    "+44 1234567",
                    "+44 2345678"
                ]
            }"#;
        let v: Entry = serde_json::from_str(data)?;
        println!("Please call {} at the number {}", v.name, v.phones[0]);
    
        let mut batch = Vec::<String>::new();
        batch.push(data.to_string());
        batch.push(data.to_string());
    
        let values: Vec<Entry> = batch
            .par_iter()
            .map(|json: &String| serde_json::from_str(json.as_str()).unwrap())
            .collect();
    
        println!("Values:\n{:#?}", values);
    
        Ok(())
    }
    
    Please call John Doe at the number +44 1234567
    Values:
    [
        Entry {
            name: "John Doe",
            age: 43,
            phones: [
                "+44 1234567",
                "+44 2345678",
            ],
        },
        Entry {
            name: "John Doe",
            age: 43,
            phones: [
                "+44 1234567",
                "+44 2345678",
            ],
        },
    ]