I have a 100 rows DataFrame
and I would like to sum together every X rows from some column.
I tried to using rolling and cumsum but it does not helped me.
lets say this is my column:
np.arange(1,100)
What I am trying to do is to sum the numbers from 1 to 10, 11 to 20 and so on (just example when X=10).
Any other solutions?
Thanks!
assuming that your index is an auto-increment starting at 0, you could do the following:
import numpy as np
import pandas as pd
df = pd.DataFrame({'a': np.arange(1,100)})
batch_size = 10
df.groupby((df.index // batch_size).rename('group_number')).a.sum().reset_index()
This outputs the following dataframe:
group_number a
0 0 55
1 1 155
2 2 255
3 3 355
4 4 455
5 5 555
6 6 655
7 7 755
8 8 855
9 9 855
the grouping is done by the expression df.index // batch_size
, which will split all, except perhaps the last group of rows into groups of consecutive rows of size batch_size
. The last group may have fewer than batch_size
rows.
we know that
(1 + 10) * (10 / 2)
(11 + 20) * (10 / 2)
so my solution seems to checkout.