I have the following dataset
A=pd.DataFrame({ 'vol_num' : 1.,
'vol_name' : pd.Categorical(["test","train","tt","tn","se","train","tt","test","train","tt"]),
'lat' : [0.188319,0.818803,0.087331,0.305681,0.871307,0.818803,0.087331,0.188319,0.818803,0.087331],
'lon' : [0.959698,0.678901,0.961500,0.229158,0.947383,0.678901,0.961500,0.959698,0.678901,0.961500],
})
For each "vol_name" I have the same "lat" and "lon".
I want to extract the "lat" and "lon" for the top 3 repeated "vol_name" in my dataframe.
The following code gives me the 3 value.
A['vol_name'].value_counts().head(3)
tt 3
train 3
test 2
Name: vol_name, dtype: int64
However, I don't know how to get each "lat" and "lon".
How can get the following outcomes? In a dataframe style with 3 columns.
tt 0.087331 0.961500
train 0.818803 0.67890
test 0.188319 0.959698
Thank you.
*my real dataset has over 500 rows.
First remove duplicates by vol_name
, then change order by index idx
and last remove column vol_num
:
idx = A["vol_name"].value_counts().head(3).index
A = (
A.drop_duplicates("vol_name")
.set_index(["vol_name"])
.reindex(idx)
.reset_index()
.drop("vol_num", 1)
)
print (A)
vol_name lat lon
0 tt 0.087331 0.961500
1 train 0.818803 0.678901
2 test 0.188319 0.959698