Lets say I have 2 dataframes in pandas. I want to perform a left join on these dataframes in a very specific way as follows, and the easiest way to explain is probably via an example.
First dataframe:
Date Col1 Col2
1/1
2/1
...
Second Dataframe:
Date A B C
1/1 90 0 0
1/1 0 75 0
1/1 73 0 0
2/1 0 0 85
2/1 0 0 75
The dates in the first dataframe are unique and will be used to join to the second dataframe. Col1 and Col2 are not blank, but they are irrelevant for what I want to do. In second data frame, a date can appear multiple times, and exactly one of the columns A, B, and C contain a number greater than 0, with the other two containing 0. There could also be some extra columns which are irrelevant for what I want to do.
I want to keep everything in the first dataframe and add columns A, B and C in such a way that the values in those columns would be the sum of all values on the given data in the second dataframe. So in the example above, I would want the output to look like this:
Date Col1 Col2 A B C
1/1 163 75 0
2/1 0 0 160
...
I have looked at the pandas join function but it doesn't seem to give the option to sum columns in the way that I want. Is there any other way to achieve such a result? I realise I could probably write a for loop to do this, but I was hoping there might be a more efficient way.
Do a groupby
sum on df1, and merge
the result with df0
:
In [127]: df1.groupby("Date", as_index=False).sum()
Out[127]:
Date A B C
0 1/1 163 75 0
1 2/1 0 0 160
In [128]: df0.merge(df1.groupby("Date", as_index=False).sum())
Out[128]:
Date Col1 Col2 A B C
0 1/1 NaN NaN 163 75 0
1 2/1 NaN NaN 0 0 160
(I'm assuming here that Date is a column, and not an index, but it's easy to either reset_index()
or use level=0
instead.)