I am facing a weird problem with Pandas groupby().sum()
together.
I have a DF with 2 category columns and 3 numeric/float
value columns - value columns have None
s as shown below:
cat_1 cat_2 val_1 val_2 val_3
0 b z 0.1 NaN NaN
1 b x 0.1 NaN NaN
2 c y 0.1 1.0 NaN
3 c z 0.1 NaN NaN
4 c x 0.1 1.0 NaN
I want to group-by cat_1
and then sum the value columns: val_1
, val_2
and val_3
per category.
This is what the final aggregation should look like:
cat_1 val_1 val_2 val_3
0 b 0.2 NaN NaN
1 c 0.3 2.0 NaN
Problem: is that when I do df.groupby(["cat_1"], as_index=False).sum()
, I get 0.0
for categories where all values are None/null
:
cat_1 val_1 val_2 val_3
0 b 0.2 0.0 0.0
1 c 0.3 2.0 0.0
How can I fix this issue? Thank you.
It seems there is no direct way of doing it like via DataFrame.sum(skipna=False)
but it is not implemented for groupby-sum. However, as ScootCork mentioned, the required behaviour can be achieved using: groupby().sum(min_count=1)
- from documentation.