I have a polars DataFrame with two list columns.
However one column contains empty lists and the other contains nulls.
I would like consistency and convert empty lists to nulls.
In [306]: df[["spcLink", "proprietors"]]
Out[306]:
shape: (254_654, 2)
┌───────────┬─────────────────────────────────┐
│ spcLink ┆ proprietors │
│ --- ┆ --- │
│ list[str] ┆ list[str] │
╞═══════════╪═════════════════════════════════╡
│ [] ┆ null │
│ [] ┆ null │
│ [] ┆ null │
│ [] ┆ null │
│ [] ┆ null │
│ … ┆ … │
│ [] ┆ ["The Steel Company of Canada … │
│ [] ┆ ["Philips' Gloeilampenfabrieke… │
│ [] ┆ ["AEG-Telefunken"] │
│ [] ┆ ["xxxx… │
│ [] ┆ ["yyyy… │
└───────────┴─────────────────────────────────┘
I have attempted this:
# Convert empty lists to None
for col, dtype in df.schema.items():
if isinstance(dtype, pl.datatypes.List):
print(col, dtype)
df = df.with_columns(
pl.when(pl.col(col).list.len() == 0).then(None).otherwise(pl.col(col))
)
But no change happens in the output; the empty lists remain as such and are not converted.
selectors.by_dtype
to select all columns of type pl.List(pl.String)
.list.len()
to determine if list is empty.df = pl.DataFrame({
"spcLink": [[],[]],
"proprietors": [None,["xxx"]]
}, schema={"spcLink": pl.List(pl.String), "proprietors": pl.List(pl.String)})
┌───────────┬─────────────┐
│ spcLink ┆ proprietors │
│ --- ┆ --- │
│ list[str] ┆ list[str] │
╞═══════════╪═════════════╡
│ [] ┆ null │
│ [] ┆ ["xxx"] │
└───────────┴─────────────┘
import polars.selectors as cs
df.with_columns(
pl.when(
cs.by_dtype(pl.List(pl.String)).list.len() > 0
).then(
cs.by_dtype(pl.List(pl.String))
)
)
┌───────────┬─────────────┐
│ spcLink ┆ proprietors │
│ --- ┆ --- │
│ list[str] ┆ list[str] │
╞═══════════╪═════════════╡
│ null ┆ null │
│ null ┆ ["xxx"] │
└───────────┴─────────────┘