Search code examples
pythonpandasdataframesumifs

Python/Pandas: Complicated Excel SUMIFS across dataframes


I'm working with two dataframes, One called Sales, pictured here enter image description here

And another dataframe, called Claims, which is pictured here enter image description here

I would like to create a new column called "SUMIFS" in another dataframe entitled "Subset" (i.e. Subset['SUMIFS'] = ...), and I am wondering if it is possible to perform an excel like SUMIFS operation across dataframes in pandas. The criteria for the SUMIFS would be:

sumrange -- Sales['Vehicle Count']

Criteria 1 -- Sales['Vehicle Production Month/Year'] = Claims['Vehicle Production Month/Year']

Criteria 2 -- Sales['DIS since RUN DATE'] < Claims['Claim DIS']

Here is the Sales Dataframe in spreadsheet form

enter image description here

Here is the Claims Dataframe in spreadsheet form

enter image description here

And, Finally, here is the original formula that I'm using in excel, that I would like to convert to Python: =SUMIFS(Sales!$D$2:$D$1156, Sales!$A$2:$A$1156, Claims!B2, Sales!$E$2:$E$1156, <"&Claims!O2)


Solution

  • If you have the two data frames in Python already then you can just simply do this. Note: This will only work if both dataframes are sorted. (If you have an ID column sort this before running the following code)

    Sales['Vehicle Count'][
    (Sales['Vehicle Production Month/Year'] == Claims['Vehicle Production Month/Year'])
    & ['DIS since RUN DATE'] < Claims['Claim DIS'])
    ].sum()
    

    I am essentially filter the data by the two if conditions, so what's left is the ones you want. All you have to do is to sum it up. (This is a direct compare, whole series by another series)