I have data as below, with 7 columns. I want to merge all rows that have the same value in columns 1-6, and sum the corresponding column-7 data.
0.1 0.2 0.3 0.1 0.2 0.3 0.001
0.1 0.2 0.3 0.1 0.2 0.3 0.002
0.9 0.9 0.9 0.1 0.1 0.1 0.002
0.9 0.9 0.9 0.8 0.8 0.8 0.1
0.9 0.9 0.9 0.8 0.8 0.8 0.2
0.1 0.2 0.3 0.1 0.2 0.3 0.001
0.1 0.2 0.3 0.1 0.1 0.1 0.002
0.9 0.9 0.9 0.1 0.1 0.1 0.002
I had tried this code:
import pandas as pd
df1 = pd.read_csv('test-data',sep='\s+',header=None,engine='python')
key1_sum = df1.groupby([0,1,2,3,4,5]).sum()
print(key1_sum)
I expected to get this output:
0.1 0.2 0.3 0.1 0.1 0.1 0.002
0.1 0.2 0.3 0.1 0.2 0.3 0.004
0.9 0.9 0.9 0.1 0.1 0.1 0.004
0.9 0.9 0.9 0.8 0.8 0.8 0.300
...but I actually got this output:
6
0 1 2 3 4 5
0.1 0.2 0.3 0.1 0.1 0.1 0.002
0.2 0.3 0.004
0.9 0.9 0.9 0.1 0.1 0.1 0.004
0.8 0.8 0.8 0.300
So I just want to not output the head, and complete the elements in the corresponding column.
Could you please give me some suggestions about that?
When we try to groupby
is we do not want the groupby
key as index
, we can pass to the as_index=False
key1_sum = df1.groupby([0,1,2,3,4,5],as_index=False).sum()