Search code examples
pythondataframepython-polars

Convert empty lists to nulls


I have a polars DataFrame with two list columns.

However one column contains empty lists and the other contains nulls.

I would like consistency and convert empty lists to nulls.


In [306]: df[["spcLink", "proprietors"]]
Out[306]: 
shape: (254_654, 2)
┌───────────┬─────────────────────────────────┐
│ spcLink   ┆ proprietors                     │
│ ---       ┆ ---                             │
│ list[str] ┆ list[str]                       │
╞═══════════╪═════════════════════════════════╡
│ []        ┆ null                            │
│ []        ┆ null                            │
│ []        ┆ null                            │
│ []        ┆ null                            │
│ []        ┆ null                            │
│ …         ┆ …                               │
│ []        ┆ ["The Steel Company of Canada … │
│ []        ┆ ["Philips' Gloeilampenfabrieke… │
│ []        ┆ ["AEG-Telefunken"]              │
│ []        ┆ ["xxxx…                         │
│ []        ┆ ["yyyy…                         │
└───────────┴─────────────────────────────────┘

I have attempted this:

# Convert empty lists to None
for col, dtype in df.schema.items():
    if isinstance(dtype, pl.datatypes.List):
        print(col, dtype)
        df = df.with_columns(
            pl.when(pl.col(col).list.len() == 0).then(None).otherwise(pl.col(col))
        )

But no change happens in the output; the empty lists remain as such and are not converted.


Solution

  • df = pl.DataFrame({
        "spcLink": [[],[]],
        "proprietors": [None,["xxx"]]
    }, schema={"spcLink": pl.List(pl.String), "proprietors": pl.List(pl.String)})
    
    ┌───────────┬─────────────┐
    │ spcLink   ┆ proprietors │
    │ ---       ┆ ---         │
    │ list[str] ┆ list[str]   │
    ╞═══════════╪═════════════╡
    │ []        ┆ null        │
    │ []        ┆ ["xxx"]     │
    └───────────┴─────────────┘
    
    import polars.selectors as cs
    
    df.with_columns(
        pl.when(
            cs.by_dtype(pl.List(pl.String)).list.len() > 0
        ).then(
            cs.by_dtype(pl.List(pl.String))
        )
    )
    
    ┌───────────┬─────────────┐
    │ spcLink   ┆ proprietors │
    │ ---       ┆ ---         │
    │ list[str] ┆ list[str]   │
    ╞═══════════╪═════════════╡
    │ null      ┆ null        │
    │ null      ┆ ["xxx"]     │
    └───────────┴─────────────┘