Search code examples
pandasindexingdataframecounting

pandas aggregate count in dataframe


I have a DataFrame and I am using .aggregate({'col1': np.sum}), this will perform a summation of the values in col1 and aggregate them together. Is it possible to perform a count, something like .aggregate({'col1': some count function here})?


Solution

  • You can use 'size', 'count', or 'nunique' depending on your use case. The differences between them being:

    • 'size': the count including NaN and repeat values.
    • 'count': the count excluding NaN but including repeats.
    • 'nunique': the count of unique values, excluding repeats and NaN.

    For example, consider the following DataFrame:

    df = pd.DataFrame({'col0': list('aabbcc'), 'col1': [1, 1, 2, np.nan, 3, 4]})
    
      col0  col1
    0    a   1.0
    1    a   1.0
    2    b   2.0
    3    b   NaN
    4    c   3.0
    5    c   4.0
    

    Then using the three functions described:

    df.groupby('col0')['col1'].agg(['size', 'count', 'nunique'])
    
          size  count  nunique
    col0                      
    a        2      2        1
    b        2      1        1
    c        2      2        2