python-polarsrust-polars

Polars schema breaks with List type


I tried creating simple polars dataframe with two columns:

import polars as pl

data = {"a": [ ["X"], ["Y"], []], "b": [3, 4, 5]}

# this works normaly
df = pl.DataFrame(data)

df_schema = [("a", pl.List),
             ("b", pl.Int8)]

# this breaks - invalid series dtype: expected `Utf8`, got `null`
df = pl.DataFrame(data, schema=schema)

Without specifying schema, it creates following Dataframe shape:

shape: (3, 2)
a           b
list[str]   i64
["X"]       3
["Y"]       4
[]          5

but when I specify exact same schema, it breaks. What could be the problem?

using polars==0.19.12


Solution

  • You need to specify the inner dtype:

    
    In [59]: import polars as pl
        ...:
        ...: data = {"a": [ ["X"], ["Y"], []], "b": [3, 4, 5]}
        ...:
        ...: # this works normaly
        ...: df = pl.DataFrame(data)
        ...:
        ...: df_schema = [("a", pl.List(pl.Utf8)),
        ...:              ("b", pl.Int8)]
        ...:
        ...: df = pl.DataFrame(data, schema=df_schema)
    
    In [60]: df
    Out[60]:
    shape: (3, 2)
    ┌───────────┬─────┐
    │ a         ┆ b   │
    │ ---       ┆ --- │
    │ list[str] ┆ i8  │
    ╞═══════════╪═════╡
    │ ["X"]     ┆ 3   │
    │ ["Y"]     ┆ 4   │
    │ []        ┆ 5   │
    └───────────┴─────┘