i have the following json file and i would like to transform to a dataframe Polars. How can I use the pl.read_json function that have schema attribute?
{
"data": {
"names": [
"A",
"B",
"C",
"D",
"E"
],
"ndarray": [
[
"abc",
true,
0.374618,
1,
0.83252
],
[
"hello",
false,
0.1265374619,
0,
0.253
]
]
}
}
I'm not sure if you can use pl.read_json
with a file structured in that way.
The issue is ndarray
contains "mixed types" which is not allowed in Polars.
[
"abc", # str
true, # bool
0.374618, # float
1, # int
0.83252 # float
]
Polars must choose a single type, e.g. in this case str
is chosen as the "supertype":
pl.select(pl.lit("""["abc", true, 1.23]""").str.json_decode())
shape: (1, 1)
┌─────────────────────────┐
│ literal │
│ --- │
│ list[str] │
╞═════════════════════════╡
│ ["abc", "true", "1.23"] │
└─────────────────────────┘
And there's no way to access the "original" type information.
If you load the JSON first, outside of Polars (e.g. using the json
module) you can use pl.DataFrame()
directly.
import json
with open("data.json") as f:
data = json.load(f)["data"]
df = pl.DataFrame(data["ndarray"], schema=data["names"])
shape: (2, 5)
┌───────┬───────┬──────────┬─────┬─────────┐
│ A ┆ B ┆ C ┆ D ┆ E │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ bool ┆ f64 ┆ i64 ┆ f64 │
╞═══════╪═══════╪══════════╪═════╪═════════╡
│ abc ┆ true ┆ 0.374618 ┆ 1 ┆ 0.83252 │
│ hello ┆ false ┆ 0.126537 ┆ 0 ┆ 0.253 │
└───────┴───────┴──────────┴─────┴─────────┘