I have a Dataframe that looks like this:
--------------------------------------------------------------------
|TradeGroup | Fund Name | Contribution | From | To |
| A | Fund_1 | 0.20 | 2013-01-01 | 2013-01-02 |
| B | Fund_1 | 0.10 | 2013-01-01 | 2013-01-02 |
| A | Fund_1 | 0.05 | 2013-01-03 | 2013-01-04 |
| B | Fund_1 | 0.45 | 2013-01-03 | 2013-01-04 |
--------------------------------------------------------------------
Basically, it's contribution of a trade group to a fund on a daily basis. What I would like to do is sum up all the Contributions for a Tradegroup per day for further analysis. What I want to see is:
--------------------------------------------------------------------
|TradeGroup | Fund Name | Contribution | From | To |
| A | Fund_1 | 0.25 | 2013-01-01 | 2013-01-04 |
| B | Fund_1 | 0.55 | 2013-01-01 | 2013-01-04 |
--------------------------------------------------------------------
I am not able to solve this using Dataframe. I've tried
df.groupby('TradeGroup')['Contribution'].sum()
However, this doesn't work. The equivalent SQL for this would be
Select SUM(Ctp) from Table Group By TradeGroup.
Any help would be highly appreciated. Thanks
You need to make sure the contribution column is numeric not strings to get the right matching numbers as in SQL. I think the weird nos that you are getting is due to the string nature of your 'contribution' column. Then the following should work:
import pandas as pd
import numpy as np
a=pd.DataFrame([['A','Fund_1','0.20','2013-01-01','2013-01-02'],
['B','Fund_1','0.10','2013-01-01','2013-01-02'],['A','Fund_1','0.05','2013-
01-03','2013-01-04'],['B','Fund_1','0.45','2013-01-03','2013-01-04']],
columns=['TraderGroup', 'Fund Name','Contribution','From', 'To'])
print a
a['Contribution'] = pd.to_numeric(a['Contribution'], errors='coerce')
b=a.groupby(['TraderGroup','Fund Name']).agg({'Contribution':np.sum,
'From':'min','To':'max'}).reset_index()
print b