I tried creating simple polars dataframe with two columns:
import polars as pl
data = {"a": [ ["X"], ["Y"], []], "b": [3, 4, 5]}
# this works normaly
df = pl.DataFrame(data)
df_schema = [("a", pl.List),
("b", pl.Int8)]
# this breaks - invalid series dtype: expected `Utf8`, got `null`
df = pl.DataFrame(data, schema=schema)
Without specifying schema, it creates following Dataframe shape:
shape: (3, 2)
a b
list[str] i64
["X"] 3
["Y"] 4
[] 5
but when I specify exact same schema, it breaks. What could be the problem?
using polars==0.19.12
You need to specify the inner dtype:
In [59]: import polars as pl
...:
...: data = {"a": [ ["X"], ["Y"], []], "b": [3, 4, 5]}
...:
...: # this works normaly
...: df = pl.DataFrame(data)
...:
...: df_schema = [("a", pl.List(pl.Utf8)),
...: ("b", pl.Int8)]
...:
...: df = pl.DataFrame(data, schema=df_schema)
In [60]: df
Out[60]:
shape: (3, 2)
┌───────────┬─────┐
│ a ┆ b │
│ --- ┆ --- │
│ list[str] ┆ i8 │
╞═══════════╪═════╡
│ ["X"] ┆ 3 │
│ ["Y"] ┆ 4 │
│ [] ┆ 5 │
└───────────┴─────┘