Is it possible to just group rows by key without performing any changes to any other column than the key column going to index ? If yes, how is can we do it ?
df = pd.DataFrame({
'id': ['A','A','A','B','B','C','C','C','C'],
'data1': [11,35,46,11,26,25,39,50,55],
'data2': [1,1,1,1,1,2,2,2,2],
})
df
I want a frame where we have ['A', 'B', 'C']
as index and every rows for data1 and data2 stored into index A if id=A
, index B if id=B
and index C if id=C
something like this :
data1 data2
A 11 1
35 1
46 1
B 11 1
26 1
C 25 2
39 2
50 2
55 2
Why not set id
as index? Like so:
df = pd.DataFrame({
'id': ['A','A','A','B','B','C','C','C','C'],
'data1': [11,35,46,11,26,25,39,50,55],
'data2': [1,1,1,1,1,2,2,2,2],
})
df.set_index(['id'], inplace=True)
df[df.index.isin(['A'])]
Output 1:
Alternatively could create a fake multi index?
df = pd.DataFrame({
'id': ['A','A','A','B','B','C','C','C','C'],
'data1': [11,35,46,11,26,25,39,50,55],
'data2': [1,1,1,1,1,2,2,2,2],
})
### create empty column
df['empty'] = ''
### create multi index
df.set_index(['id','empty'], inplace=True)
# rename index to none if you dont want index name
df.index.set_names(None, level=0, inplace=True)
### query like this
df.loc[df.index.get_level_values(0) == 'A']
## or like this
df.loc[df.index.get_level_values(0) == 'A'].droplevel(1)