Search code examples
pandaspd.grouper

future warning on pandas groupby(pd.Grouper(key='Date',freq='M')).agg({'Purchases':[ np.count_nonzero,np.sum]})


python==3.12.0 (tags/v3.12.0:0fb18b0, Oct 2 2023, 13:03:39) [MSC v.1935 64 bit (AMD64)] on win32

import pandas as pd #pandas==2.1.3
import numpy as np #numpy==1.26.1

dfcardtype:

     Date       Purchases    Credits    Card Type
9   2019-03-13       0.00     150.00         visa     
21  2019-04-13    2031.67       0.00         visa
27  2019-04-24    1311.00       0.00         visa
33  2019-05-09    1021.11       0.00         visa
36  2019-05-17    1860.00       0.00         visa

df = dfcardtype.groupby(pd.Grouper(key='Date',freq='M')).agg({'Purchases':[ np.count_nonzero,np.sum]})

expected result: count and sum the Purchases column monthwise excluding zero cells

FutureWarning: The provided callable <function sum at 0x000001A352B6B380> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.

df = dfcardtype.groupby(pd.Grouper(key='Date',freq='M')).agg({'Purchases':[ np.count_nonzero,np.sum]})

expected result: count and sum the Purchases column monthwise excluding zero cells

FutureWarning: The provided callable <function sum at 0x000001A352B6B380> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.


Solution

  • Warning is clear - use sum instead np.sum. Also for avoid MultiIndex is possible specify column after groupby:

    df = (dfcardtype.groupby(pd.Grouper(key='Date',freq='M'))['Purchases']
                        .agg([np.count_nonzero,'sum']))
    print (df)
                count_nonzero      sum
    Date                              
    2019-03-31              0     0.00
    2019-04-30              2  3342.67
    2019-05-31              2  2881.11
    

    Also is possible set custom columns names in output by named aggregation:

    df = (dfcardtype.groupby(pd.Grouper(key='Date',freq='M'))
                        .agg(Count_no_zero=('Purchases', np.count_nonzero),
                             Sum=('Purchases','sum')))
    print (df)
                Count_no_zero      Sum
    Date                              
    2019-03-31              0     0.00
    2019-04-30              2  3342.67
    2019-05-31              2  2881.11
    

    If need MultiIndex in columns use your solution:

    df = (dfcardtype.groupby(pd.Grouper(key='Date',freq='M'))
                    .agg({'Purchases':[ np.count_nonzero,'sum']}))
    print (df)
                   Purchases         
               count_nonzero      sum
    Date                             
    2019-03-31             0     0.00
    2019-04-30             2  3342.67
    2019-05-31             2  2881.11