Search code examples
pythonsum

Python sum dataframe column in batches


I have a 100 rows DataFrame and I would like to sum together every X rows from some column.

I tried to using rolling and cumsum but it does not helped me.

lets say this is my column:

np.arange(1,100)

What I am trying to do is to sum the numbers from 1 to 10, 11 to 20 and so on (just example when X=10).

Any other solutions?

Thanks!


Solution

  • assuming that your index is an auto-increment starting at 0, you could do the following:

    import numpy as np
    import pandas as pd
    df = pd.DataFrame({'a': np.arange(1,100)})
    batch_size = 10
    df.groupby((df.index // batch_size).rename('group_number')).a.sum().reset_index()
    

    This outputs the following dataframe:

       group_number    a
    0             0   55
    1             1  155
    2             2  255
    3             3  355
    4             4  455
    5             5  555
    6             6  655
    7             7  755
    8             8  855
    9             9  855
    

    the grouping is done by the expression df.index // batch_size, which will split all, except perhaps the last group of rows into groups of consecutive rows of size batch_size. The last group may have fewer than batch_size rows.

    we know that

    • the sum of all integers between 1 & 10 is 55, (1 + 10) * (10 / 2)
    • the sum of all integers between 11 & 20 is 155, (11 + 20) * (10 / 2)
    • the other group-sums also check out with the Gauss's Summation Technique.

    so my solution seems to checkout.