I'm working with two dataframes, One called Sales, pictured here
And another dataframe, called Claims, which is pictured here
I would like to create a new column called "SUMIFS" in another dataframe entitled "Subset" (i.e. Subset['SUMIFS'] = ...), and I am wondering if it is possible to perform an excel like SUMIFS operation across dataframes in pandas. The criteria for the SUMIFS would be:
sumrange -- Sales['Vehicle Count']
Criteria 1 -- Sales['Vehicle Production Month/Year'] = Claims['Vehicle Production Month/Year']
Criteria 2 -- Sales['DIS since RUN DATE'] < Claims['Claim DIS']
Here is the Sales Dataframe in spreadsheet form
Here is the Claims Dataframe in spreadsheet form
And, Finally, here is the original formula that I'm using in excel, that I would like to convert to Python: =SUMIFS(Sales!$D$2:$D$1156, Sales!$A$2:$A$1156, Claims!B2, Sales!$E$2:$E$1156, <"&Claims!O2)
If you have the two data frames in Python already then you can just simply do this. Note: This will only work if both dataframes are sorted. (If you have an ID column sort this before running the following code)
Sales['Vehicle Count'][
(Sales['Vehicle Production Month/Year'] == Claims['Vehicle Production Month/Year'])
& ['DIS since RUN DATE'] < Claims['Claim DIS'])
].sum()
I am essentially filter the data by the two if conditions, so what's left is the ones you want. All you have to do is to sum it up. (This is a direct compare, whole series by another series)