I have a dataframe.
df = pl.DataFrame({
'last_name':['James','Warner','Marino','James','Warner','Marino','James'],
'first_name':['Horn','Bro','Kach','Horn','Bro','Kach','Horn'],
'dob':['03/06/1990','09/16/1990','03/06/1990','','03/06/1990','','']
})
I'm applying a grouping on last,first and dob columns to get the counts as
df.group_by('last_name','first_name','dob').len()
shape: (6, 4)
┌───────────┬────────────┬────────────┬─────┐
│ last_name ┆ first_name ┆ dob ┆ len │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ str ┆ str ┆ u32 │
╞═══════════╪════════════╪════════════╪═════╡
│ Warner ┆ Bro ┆ 09/16/1990 ┆ 1 │
│ Marino ┆ Kach ┆ 03/06/1990 ┆ 1 │
│ Marino ┆ Kach ┆ ┆ 1 │
│ James ┆ Horn ┆ ┆ 2 │ # NOT OK
│ James ┆ Horn ┆ 03/06/1990 ┆ 1 │
│ Warner ┆ Bro ┆ 03/06/1990 ┆ 1 │
└───────────┴────────────┴────────────┴─────┘
Here i would like to ignore the NULL/EMPTY values on grouping columns.
As James Horn has two empty DOB's these should not be included to grouping operation.
Here is the expected output.
shape: (4, 4)
┌───────────┬────────────┬────────────┬─────┐
│ last_name ┆ first_name ┆ dob ┆ len │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ str ┆ str ┆ u32 │
╞═══════════╪════════════╪════════════╪═════╡
│ Warner ┆ Bro ┆ 03/06/1990 ┆ 1 │
│ James ┆ Horn ┆ 03/06/1990 ┆ 1 │
│ Marino ┆ Kach ┆ 03/06/1990 ┆ 1 │
│ Warner ┆ Bro ┆ 09/16/1990 ┆ 1 │
└───────────┴────────────┴────────────┴─────┘
Of course we can do filter on the column as below before pass to grouping as
df.filter(pl.col.dob != '').group_by('last_name','first_name','dob').len()
But what if I have 10 columns to be specified in filter operation? I need to write them one after another.
Is there any other solution for it?
First replace empty strings with null
values and then use drop_nulls
(
df
.with_columns(
pl.when(pl.col(group_columns).str.len_chars() == 0)
.then(None)
.otherwise(pl.col(group_columns))
.name.keep()
)
.drop_nulls(group_columns)
.group_by(group_columns)
.len()
)
shape: (4, 4)
┌───────────┬────────────┬────────────┬───────┐
│ last_name ┆ first_name ┆ dob ┆ len │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ str ┆ str ┆ u32 │
╞═══════════╪════════════╪════════════╪═══════╡
│ Warner ┆ Bro ┆ 09/16/1990 ┆ 1 │
│ Marino ┆ Kach ┆ 03/06/1990 ┆ 1 │
│ Warner ┆ Bro ┆ 03/06/1990 ┆ 1 │
│ James ┆ Horn ┆ 03/06/1990 ┆ 1 │
└───────────┴────────────┴────────────┴───────┘