Search code examples
pythonpython-polars

transform a string representing a list in each cell of a polars DataFrame column to an actual list


I am new to polars library and the title says it all regarding what I am trying to do.

Doing this with the pandas library I would use apply() and the build in eval() function of Python. since eval("[1,2,3]") returns [1,2,3].

This can be done in polars as well - below I have an expected output example - but polars strongly recommends to use its Expression API. I searched the Expr.str attribute but didn't find an expression that does this. Am I missing something or should go with apply()?

data = {'col_string': ['[1,2,3]', '[4,5,6]']}

df = pl.DataFrame(data)
df = df.with_columns(pl.col('col_string').map_elements(eval).alias('col_list'))

shape: (2, 2)
┌────────────┬───────────┐
│ col_string ┆ col_list  │
│ ---        ┆ ---       │
│ str        ┆ list[i64] │
╞════════════╪═══════════╡
│ [1,2,3]    ┆ [1, 2, 3] │
│ [4,5,6]    ┆ [4, 5, 6] │
└────────────┴───────────┘

Solution

  • As long as your string column is valid JSON, you could use polars.Expr.str.json_decode as follows.

    df.with_columns(
        pl.col("col_string").str.json_decode().alias("col_list")
    )
    

    Output.

    shape: (2, 2)
    ┌────────────┬───────────┐
    │ col_string ┆ col_list  │
    │ ---        ┆ ---       │
    │ str        ┆ list[i64] │
    ╞════════════╪═══════════╡
    │ [1,2,3]    ┆ [1, 2, 3] │
    │ [4,5,6]    ┆ [4, 5, 6] │
    └────────────┴───────────┘