I have a DataFrame with two columns in it, one an integer and one a string. For each row, I need to combine these two values into a JSON object with a certain schema ({"a": a, "b": b}
). Then I need to bind a vector of these objects to an sqlx query so I can insert them into my Postgres database.
This works with just a single column:
use serde_json::json;
//...
let cols = merged_df
.columns(["a"])
.unwrap();
let values: Vec<serde_json::Value> = cols[0]
.str()?
.into_iter()
.map(|a| json!({"a": a}))
.collect();
So my questions are:
map
the most idiomatic way to create the final value? Is it performant? I've found other multi-column approaches which use apply
or with_column
, but these return new DataFrame
objects, and I don't think I can have a column with a serde_json::Value
object. Maybe I can use into_struct
and a custom struct which I populate that implements serde_json::Serialize
, but I think I'll still have to iterate through the column again somehow to serialize the struct into the serde_json::Value
objects to send to sqlx. If so, iterating through twice seems inefficient.You can just zip two columns together. If you have more than two columns, you might want itertools::izip!
.
use polars::prelude::*;
use serde_json::json;
fn main() -> Result<(), Box<dyn std::error::Error>> {
let df = df![
"a" => [Some(1), None, Some(3)],
"b" => [Some("x"), Some("y"), None],
]
.unwrap();
let a = df.column("a")?.i32()?;
let b = df.column("b")?.str()?;
let j = a
.into_iter()
.zip(b)
.map(|(a, b)| json!({ "a": a, "b": b }))
.collect::<Vec<_>>();
println!("{j:#?}");
Ok(())
}
[
Object {
"a": Number(1),
"b": String("x"),
},
Object {
"a": Null,
"b": String("y"),
},
Object {
"a": Number(3),
"b": Null,
},
]