this is the dataframe
import pandas as pd
df = pd.DataFrame({'IDENTIFIER': ['A_xcxcxc', 'BA_bcbcbc', 'A_xcxcxc', 'A_xcxcxc', 'BA_bcbcbc', 'C_rgrg', 'BA_bcbcbc', 'D_wewerw', 'A_xcxcxc', 'A_xcxcxc'],
'income': [-30362100.0, 200000.0, -21248077.5, 150000.0, -33843389.2, 200000.0, -40229279.75, 250000.0, -22111384.6, 200000.0],
'Date' : ['03/03/2031', '22/01/2060', '04/03/2025', '22/07/2032', '08/03/2028', '22/11/2065', '05/04/2024', '22/03/2032', '15/10/2025', '22/07/2065']
})
I want to aggregate income for each of the identifier, but only if it falls before 1/1/2030. just to clarify if I do so in excel using sumifs I get this
I am assuming it can be done using groupby fuction, but not sure how to add the conditionality related to the date.
If you want the non-matching keys:
>>> (df.groupby('IDENTIFIER')
.apply(lambda x:x.loc[
pd.to_datetime(x.Date).lt('2030-01-01'),
'income'
].sum(min_count=1))
.fillna('-'))
IDENTIFIER
A_xcxcxc -43359462.10
BA_bcbcbc -74072668.95
C_rgrg -
D_wewerw -
Without using apply
:
>>> ( df['income']
.where(pd.to_datetime(df.Date).lt('2030-01-01'))
.groupby(df['IDENTIFIER']).sum(min_count=1).fillna('-') )
IDENTIFIER
A_xcxcxc -43359462.10
BA_bcbcbc -74072668.95
C_rgrg -
D_wewerw -
Name: income, dtype: object
NOTE: If you want np.nan
instead of -
, remove the fillna('-')
at the end.
Otherwise, if you only want the matching groups:
>>> df.groupby(df.loc[
pd.to_datetime(df.Date).lt('2030-01-01'),
'IDENTIFIER'
])['income'].sum()
IDENTIFIER
A_xcxcxc -43359462.10
BA_bcbcbc -74072668.95
Name: income, dtype: float64