I have data which are constituted too many row in dataframe
ex)input:
No col1 col2 col3 col4
1 0 5 6 8
2 0 5 7 8
3 0 7 5 2
4 0 4 4 5
. . . . .
. . . . .
. . . . .
output:
New_No col1 col2 col3 col4
1 0 5.66 6 6
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
I want to abbreviate 3 rows in 1 rows
to use average(3 rows average)
What can I do for this?
You can take the mean
after using groupby
:
>>> df = pd.DataFrame(np.random.randint(0, 10, (9, 5)))
>>> df
0 1 2 3 4
0 9 7 9 8 8
1 5 5 5 5 7
2 6 5 3 3 0
3 5 2 9 3 3
4 6 0 5 9 4
5 9 8 9 2 3
6 6 9 8 7 2
7 8 1 9 7 6
8 7 9 2 2 8
>>> df.groupby(np.arange(len(df))//3).mean()
0 1 2 3 4
0 6.666667 5.666667 5.666667 5.333333 5.000000
1 6.666667 3.333333 7.666667 4.666667 3.333333
2 7.000000 6.333333 6.333333 5.333333 5.333333
This works because when we divide the range by 3, we get clusters of 3:
>>> np.arange(len(df))//3
array([0, 0, 0, 1, 1, 1, 2, 2, 2])
and we can group on these numbers. This way, even if we wind up with a group of 2 (say because the total number of rows isn't divisible by 3), it automatically gives us the right mean.