Given a polars dataframe with empty lists in column values. How can I replace them with pl.Null
or None
in order to count them as missing values?
Input:
df = pl.DataFrame([
pl.Series('contacts', [[], [], ['1081'], ['1313'], ['3657']], dtype=pl.List(pl.String)),
pl.Series('line_items', [[], [], [], [], []], dtype=pl.List(pl.String)),
])
I have tried:
df.with_columns(
pl.when(pl.col(pl.List(pl.Null)))
.then(None)
.otherwise(pl.col(pl.List))
.name.keep()
)
But the []
are still present in the output.
Instead, I want this:
shape: (5, 2)
┌───────────┬────────────┐
│ contacts ┆ line_items │
│ --- ┆ --- │
│ list[str] ┆ list[str] │
╞═══════════╪════════════╡
│ null ┆ null │
│ null ┆ null │
│ ["1081"] ┆ null │
│ ["1313"] ┆ null │
│ ["3657"] ┆ null │
└───────────┴────────────┘
As @Dean MacGregor mentioned, you can do this with .list.len()==0
Here is the code for it:
# transformation for 1 column
df.with_columns(
pl.when(pl.col('contacts').list.len() == 0)
.then(None)
.otherwise(pl.col('contacts'))
.name.keep()
)
shape: (5, 2)
┌───────────┬────────────┐
│ contacts ┆ line_items │
│ --- ┆ --- │
│ list[str] ┆ list[str] │
╞═══════════╪════════════╡
│ null ┆ [] │
│ null ┆ [] │
│ ["1081"] ┆ [] │
│ ["1313"] ┆ [] │
│ ["3657"] ┆ [] │
└───────────┴────────────┘
# transformation for all columns of datatype List(Str)
df.with_columns(
pl.when(pl.col(pl.List(pl.String)).list.len() == 0)
.then(None)
.otherwise(pl.col(pl.List(pl.String)))
.name.keep()
)
shape: (5, 2)
┌───────────┬────────────┐
│ contacts ┆ line_items │
│ --- ┆ --- │
│ list[str] ┆ list[str] │
╞═══════════╪════════════╡
│ null ┆ null │
│ null ┆ null │
│ ["1081"] ┆ null │
│ ["1313"] ┆ null │
│ ["3657"] ┆ null │
└───────────┴────────────┘