Search code examples
rustrust-polars

Create JsonValue from multiple columns in Rust Polars DataFrame


I have a DataFrame with two columns in it, one an integer and one a string. For each row, I need to combine these two values into a JSON object with a certain schema ({"a": a, "b": b}). Then I need to bind a vector of these objects to an sqlx query so I can insert them into my Postgres database.

This works with just a single column:

use serde_json::json;

//...

    let cols = merged_df
        .columns(["a"])
        .unwrap();
    let values: Vec<serde_json::Value> = cols[0]
        .str()?
        .into_iter()
        .map(|a| json!({"a": a}))
        .collect();

So my questions are:

  1. How do I draw from multiple columns when creating my JSON?
  2. Is using map the most idiomatic way to create the final value? Is it performant? I've found other multi-column approaches which use apply or with_column, but these return new DataFrame objects, and I don't think I can have a column with a serde_json::Value object. Maybe I can use into_struct and a custom struct which I populate that implements serde_json::Serialize, but I think I'll still have to iterate through the column again somehow to serialize the struct into the serde_json::Value objects to send to sqlx. If so, iterating through twice seems inefficient.

Solution

  • You can just zip two columns together. If you have more than two columns, you might want itertools::izip!.

    use polars::prelude::*;
    use serde_json::json;
    
    fn main() -> Result<(), Box<dyn std::error::Error>> {
        let df = df![
          "a" => [Some(1), None, Some(3)],
          "b" => [Some("x"), Some("y"), None],
        ]
        .unwrap();
    
        let a = df.column("a")?.i32()?;
        let b = df.column("b")?.str()?;
    
        let j = a
            .into_iter()
            .zip(b)
            .map(|(a, b)| json!({ "a": a, "b": b }))
            .collect::<Vec<_>>();
    
        println!("{j:#?}");
    
        Ok(())
    }
    
    [
        Object {
            "a": Number(1),
            "b": String("x"),
        },
        Object {
            "a": Null,
            "b": String("y"),
        },
        Object {
            "a": Number(3),
            "b": Null,
        },
    ]