Search code examples
pandasdataframepandas-groupby

Pandas groupby().sum() is not ignoring None/empty/np.nan values


I am facing a weird problem with Pandas groupby().sum() together.

I have a DF with 2 category columns and 3 numeric/float value columns - value columns have Nones as shown below:

  cat_1 cat_2  val_1  val_2  val_3
0     b     z    0.1    NaN    NaN
1     b     x    0.1    NaN    NaN
2     c     y    0.1    1.0    NaN
3     c     z    0.1    NaN    NaN
4     c     x    0.1    1.0    NaN

I want to group-by cat_1 and then sum the value columns: val_1, val_2 and val_3 per category.

This is what the final aggregation should look like:


  cat_1  val_1  val_2  val_3
0     b    0.2    NaN    NaN
1     c    0.3    2.0    NaN

Problem: is that when I do df.groupby(["cat_1"], as_index=False).sum(), I get 0.0 for categories where all values are None/null:

  cat_1  val_1  val_2  val_3
0     b    0.2    0.0    0.0
1     c    0.3    2.0    0.0

How can I fix this issue? Thank you.


Solution

  • It seems there is no direct way of doing it like via DataFrame.sum(skipna=False) but it is not implemented for groupby-sum. However, as ScootCork mentioned, the required behaviour can be achieved using: groupby().sum(min_count=1) - from documentation.