The problem that I have is trying to read in a flattened json file into a polars dataframe in Rust.
Here is the Json example with a flattened JSON format. How would this structure be read into a DataFrame without labeling each column dtype in a struct?
{
"data": [
{
"requestId": "IBM",
"date": "2024-03-19",
"sales": 61860,
"company": "International Business Machines",
"price": 193.34,
"score": 7
},
{
"requestId": "AAPL",
"date": "2024-03-19",
"sales": 383285,
"company": "Apple Inc.",
"price": 176.08,
"score": 9
},
{
"requestId": "MSFT",
"date": "2024-03-19",
"sales": 211915,
"company": "Microsoft Corporation",
"price": 421.41,
"score": 7
}
]
}
There are only Integers, Floats, and Strings in the data.
Here is the example struct that I tried creating. If there are 200+ columns that change, would it be best to create a HashMap to store the columns dynamically?
#[derive(Debug, Deserialize, Serialize)]
#[serde(rename_all = "camelCase")]
struct Row {
requestId: String,
date: String,
#[serde(flatten)]
company_data: HashMap<String, serde_json::Value>,
}
This is a second half question for the Non-Flattened JSON data: Transform JSON Key into a Polars DataFrame
This format is almost what polars' JsonReader
expects; it is only the top-level object that is the problem. However, we can strip it with string manipulation:
pub fn flattened(json: &str) -> Result<DataFrame, Box<dyn Error>> {
let json = json.trim();
let json = json
.strip_prefix("{")
.ok_or("invalid JSON")?
.strip_suffix("}")
.ok_or("invalid JSON")?;
let json = json.trim_start();
let json = json.strip_prefix(r#""data""#).ok_or("invalid JSON")?;
let json = json.trim_start();
let json = json.strip_prefix(":").ok_or("invalid JSON")?;
let json_reader = JsonReader::new(std::io::Cursor::new(json));
let mut df = json_reader.finish()?;
let date = df.column("date")?.cast(&DataType::Date)?;
df.replace("date", date)?;
Ok(df)
}