Search code examples
pythonpandasdataframemultiple-columnscumsum

Pandas Cumsum by Same Name Columns Not Working


I have an unusual task in that I need to calculate a cumsum answer by column with the same column name and then by rows in months (1-12). I think that there is a problem in python with using cumsum on non-unique column names. For example, my data looks like this in DataFrame df:

month   Adj_Prod            Adj_Prod            Adj_Prod
1       441215.6939045699   441215.6939045699   441215.6939045699
2       519150.654291369    519150.654291369    519150.654291369
3       479624.188336129    479624.188336129    479624.188336129
4       444504.2012028889   444504.2012028889   444504.2012028889
5       327232.2926583468   327232.2926583468   327232.2926583468
6       270260.9249640972   270260.9249640972   270260.9249640972
7       245660.8295624059   245660.8295624059   245660.8295624059
8       174337.2269775      174337.2269775      174337.2269775
9       249508.2308332307   281902.4467308491   240007.62178783625
10      335156.9175691739   409076.49866234337  370223.7513004631
11      410662.9605211421   413049.84287055075  454906.31546153163
12      443616.23516782525  445685.3862165406   446864.45394745696

I need my final answer to look like this:

month   Adj_Prod    Adj_Prod    Adj_Prod
1       441216      441216      441216
2       960366      960366      960366
3      1439991     1439991     1439991
4      1884495     1884495     1884495
5      2211727     2211727     2211727
6      2481988     2481988     2481988
7      2727649     2727649     2727649
8      2901986     2901986     2901986
9      3151494     3183888     3141994
10     3486651     3592965     3512217
11     3897314     4006015     3967124
12     4340930     4451700     4413988

This should work but the cumsum does nothing - it just gives me the exact same data in the DataFrame df above. Here is my most basic attempt:

df.groupby('month')['Adj_Prod'].cumsum()

Solution

  • Although it's highly encourage to not have the dataframe with same column names, you can use df.apply

    df['Adj_Prod'].apply(lambda x: x.cumsum())
    
            Adj_Prod      Adj_Prod      Adj_Prod
    0   4.412157e+05  4.412157e+05  4.412157e+05
    1   9.603663e+05  9.603663e+05  9.603663e+05
    2   1.439991e+06  1.439991e+06  1.439991e+06
    3   1.884495e+06  1.884495e+06  1.884495e+06
    4   2.211727e+06  2.211727e+06  2.211727e+06
    5   2.481988e+06  2.481988e+06  2.481988e+06
    6   2.656325e+06  2.656325e+06  2.656325e+06
    7   2.896333e+06  2.896333e+06  2.896333e+06
    8   3.266557e+06  3.266557e+06  3.266557e+06
    9   3.721463e+06  3.721463e+06  3.721463e+06
    10  4.168327e+06  4.168327e+06  4.168327e+06