How to ignore EMPTY/NULL value columns while grouping in python polars?

I have a dataframe.

df = pl.DataFrame({
    'last_name':['James','Warner','Marino','James','Warner','Marino','James'],
    'first_name':['Horn','Bro','Kach','Horn','Bro','Kach','Horn'],
    'dob':['03/06/1990','09/16/1990','03/06/1990','','03/06/1990','','']
})

I'm applying a grouping on last,first and dob columns to get the counts as

df.group_by('last_name','first_name','dob').len()

shape: (6, 4)
┌───────────┬────────────┬────────────┬─────┐
│ last_name ┆ first_name ┆ dob        ┆ len │
│ ---       ┆ ---        ┆ ---        ┆ --- │
│ str       ┆ str        ┆ str        ┆ u32 │
╞═══════════╪════════════╪════════════╪═════╡
│ Warner    ┆ Bro        ┆ 09/16/1990 ┆ 1   │
│ Marino    ┆ Kach       ┆ 03/06/1990 ┆ 1   │
│ Marino    ┆ Kach       ┆            ┆ 1   │
│ James     ┆ Horn       ┆            ┆ 2   │ # NOT OK
│ James     ┆ Horn       ┆ 03/06/1990 ┆ 1   │
│ Warner    ┆ Bro        ┆ 03/06/1990 ┆ 1   │
└───────────┴────────────┴────────────┴─────┘

Here i would like to ignore the NULL/EMPTY values on grouping columns.

As James Horn has two empty DOB's these should not be included to grouping operation.

Here is the expected output.

shape: (4, 4)
┌───────────┬────────────┬────────────┬─────┐
│ last_name ┆ first_name ┆ dob        ┆ len │
│ ---       ┆ ---        ┆ ---        ┆ --- │
│ str       ┆ str        ┆ str        ┆ u32 │
╞═══════════╪════════════╪════════════╪═════╡
│ Warner    ┆ Bro        ┆ 03/06/1990 ┆ 1   │
│ James     ┆ Horn       ┆ 03/06/1990 ┆ 1   │
│ Marino    ┆ Kach       ┆ 03/06/1990 ┆ 1   │
│ Warner    ┆ Bro        ┆ 09/16/1990 ┆ 1   │
└───────────┴────────────┴────────────┴─────┘

Of course we can do filter on the column as below before pass to grouping as

df.filter(pl.col.dob != '').group_by('last_name','first_name','dob').len()

But what if I have 10 columns to be specified in filter operation? I need to write them one after another.

Is there any other solution for it?

Solution

First replace empty strings with null values and then use drop_nulls

(
    df
    .with_columns(
        pl.when(pl.col(group_columns).str.len_chars() == 0)
        .then(None)
        .otherwise(pl.col(group_columns))
        .name.keep()
    )
    .drop_nulls(group_columns)
    .group_by(group_columns)
    .len()
)

shape: (4, 4)
┌───────────┬────────────┬────────────┬───────┐
│ last_name ┆ first_name ┆ dob        ┆ len   │
│ ---       ┆ ---        ┆ ---        ┆ ---   │
│ str       ┆ str        ┆ str        ┆ u32   │
╞═══════════╪════════════╪════════════╪═══════╡
│ Warner    ┆ Bro        ┆ 09/16/1990 ┆ 1     │
│ Marino    ┆ Kach       ┆ 03/06/1990 ┆ 1     │
│ Warner    ┆ Bro        ┆ 03/06/1990 ┆ 1     │
│ James     ┆ Horn       ┆ 03/06/1990 ┆ 1     │
└───────────┴────────────┴────────────┴───────┘