Search code examples
pythonpandasgroup-bysumifs

Whats the python equivalent for sumifs to solve the given problem


this is the dataframe

import pandas as pd
df = pd.DataFrame({'IDENTIFIER': ['A_xcxcxc', 'BA_bcbcbc', 'A_xcxcxc', 'A_xcxcxc', 'BA_bcbcbc', 'C_rgrg', 'BA_bcbcbc', 'D_wewerw', 'A_xcxcxc', 'A_xcxcxc'],
                   'income': [-30362100.0, 200000.0, -21248077.5, 150000.0, -33843389.2, 200000.0, -40229279.75, 250000.0, -22111384.6, 200000.0],
'Date' : ['03/03/2031', '22/01/2060', '04/03/2025', '22/07/2032', '08/03/2028', '22/11/2065', '05/04/2024', '22/03/2032', '15/10/2025', '22/07/2065']
})

I want to aggregate income for each of the identifier, but only if it falls before 1/1/2030. just to clarify if I do so in excel using sumifs I get this

enter image description here

I am assuming it can be done using groupby fuction, but not sure how to add the conditionality related to the date.


Solution

  • If you want the non-matching keys:

    >>> (df.groupby('IDENTIFIER')
         .apply(lambda x:x.loc[
                          pd.to_datetime(x.Date).lt('2030-01-01'), 
                          'income'
                      ].sum(min_count=1))
         .fillna('-'))
    
    IDENTIFIER
    A_xcxcxc    -43359462.10
    BA_bcbcbc   -74072668.95
    C_rgrg                 -
    D_wewerw               -
    

    Without using apply:

    >>> ( df['income']
            .where(pd.to_datetime(df.Date).lt('2030-01-01'))
            .groupby(df['IDENTIFIER']).sum(min_count=1).fillna('-') )
    
    IDENTIFIER
    A_xcxcxc    -43359462.10
    BA_bcbcbc   -74072668.95
    C_rgrg                 -
    D_wewerw               -
    Name: income, dtype: object
    

    NOTE: If you want np.nan instead of -, remove the fillna('-') at the end.

    Otherwise, if you only want the matching groups:

    >>> df.groupby(df.loc[
             pd.to_datetime(df.Date).lt('2030-01-01'), 
             'IDENTIFIER'
        ])['income'].sum()
    
    IDENTIFIER
    A_xcxcxc    -43359462.10
    BA_bcbcbc   -74072668.95
    Name: income, dtype: float64