Search code examples
pythonpython-3.xpandaspandas-groupbypandasql

Group by and Sum in Pandas without losing columns


I have a Dataframe that looks like this:

--------------------------------------------------------------------
|TradeGroup | Fund Name | Contribution | From       | To           |
|  A        | Fund_1    |   0.20       | 2013-01-01 | 2013-01-02   |
|  B        | Fund_1    |   0.10       | 2013-01-01 | 2013-01-02   |
|  A        | Fund_1    |   0.05       | 2013-01-03 | 2013-01-04   |
|  B        | Fund_1    |   0.45       | 2013-01-03 | 2013-01-04   |
--------------------------------------------------------------------

Basically, it's contribution of a trade group to a fund on a daily basis. What I would like to do is sum up all the Contributions for a Tradegroup per day for further analysis. What I want to see is:

--------------------------------------------------------------------
|TradeGroup | Fund Name | Contribution | From       | To           |
|  A        | Fund_1    |   0.25       | 2013-01-01 | 2013-01-04   |
|  B        | Fund_1    |   0.55       | 2013-01-01 | 2013-01-04   |
--------------------------------------------------------------------

I am not able to solve this using Dataframe. I've tried

df.groupby('TradeGroup')['Contribution'].sum()

However, this doesn't work. The equivalent SQL for this would be

Select SUM(Ctp) from Table Group By TradeGroup. 

Any help would be highly appreciated. Thanks


Solution

  • You need to make sure the contribution column is numeric not strings to get the right matching numbers as in SQL. I think the weird nos that you are getting is due to the string nature of your 'contribution' column. Then the following should work:

    import pandas as pd
    import numpy as np
    a=pd.DataFrame([['A','Fund_1','0.20','2013-01-01','2013-01-02'],
    ['B','Fund_1','0.10','2013-01-01','2013-01-02'],['A','Fund_1','0.05','2013-
    01-03','2013-01-04'],['B','Fund_1','0.45','2013-01-03','2013-01-04']],
                columns=['TraderGroup', 'Fund Name','Contribution','From', 'To'])
    print a
    a['Contribution'] = pd.to_numeric(a['Contribution'], errors='coerce')
    b=a.groupby(['TraderGroup','Fund Name']).agg({'Contribution':np.sum,
                                             'From':'min','To':'max'}).reset_index()
    print b