Search code examples
pythonpandassumifs

Excel SUMIFs with multiple conditions, including day inequalities in pandas


The Excel function SUMIFS supports calculation based on multiple criteria, including day inequalities, as follows

values_to_sum, criteria_range_n, condition_n, .., criteria_range_n, condition_n

Example

Input - tips per person per day, multiple entries per person per day allowed

date        person  tip
02/03/2022  X       10
05/03/2022  X       30
05/03/2022  Y       20
08/03/2022  X       12
08/03/2022  X       8

Output - sum per selected person per day

date        X_sum_per_day
01/03/2022  0
02/03/2022  10
03/03/2022  0
04/03/2022  0
05/03/2022  30
06/03/2022  0
07/03/2022  0
08/03/2022  20
09/03/2022  0
10/03/2022  0

enter image description here

Can this be implemented in pandas and calculated as series for an input range of days? Cumulative would be presumably just application of cumsum() but the initial sum based on multiple criteria is tricky, especially if to be concise.

Code

import pandas as pd

df = pd.DataFrame({'date': ['02-03-2022 00:00:00',
                            '05-03-2022 00:00:00',
                            '05-03-2022 00:00:00',
                            '08-03-2022 00:00:00',
                            '08-03-2022 00:00:00'],
                   'person': ['X', 'X', 'Y', 'X', 'X'],
                   'tip': [10, 30, 20, 12, 8]},
                   index = [0, 1, 2, 3, 4])

df2 = pd.DataFrame({'date':pd.date_range(start='2022-03-01', end='2022-03-10')})
temp = df[df['person'] == 'X'].groupby(['date']).sum().reset_index()
df2['X_sum'] = df2['date'].map(temp.set_index('date')['tip']).fillna(0)

The above seems kinda hacky and not as simple to reason about as Excel SUMIFS. Additional conditions would also be a hassle (e.g. sum where country = X, company = Y, person = Z).

Any idea for alternative implementation?


Solution

  • IIUC, you want to filter the person X then groupby day and sum the tips, finally reindex the missing days:

    df['date'] = pd.to_datetime(df['date'], dayfirst=True)
    
    out = (df[df['person'].eq('X')]
     .groupby('date')['tip'].sum()
     .reindex(pd.date_range(start='2022-03-01', end='2022-03-10'),
              fill_value=0)
     .reset_index()
    )
    

    output:

           index  tip
    0 2022-03-01    0
    1 2022-03-02   10
    2 2022-03-03    0
    3 2022-03-04    0
    4 2022-03-05   30
    5 2022-03-06    0
    6 2022-03-07    0
    7 2022-03-08   20
    8 2022-03-09    0
    9 2022-03-10    0