I have a dataset with about 320k records. Of these, I want to display a swarmplot with the top 20 entities of the category in the x axis (Refined_ID
in this case) by their count. How can one achieve that? For example, if my data is:
Refined_ID Refined_Age Name
e123 21 foo1
f123 19 bar1
z123 26 foo2
f123 29 bar2
e123 20 foo1
e1342 19 bar3
f123 20 foo3
I would like my x-axis to be ordered as:
e123 f123 z123
This is my code:
g = sns.swarmplot(x = dfAnalysis['Refined_ID'].iloc[:20],y = dfAnalysis['Refined_Age'], hue = dfAnalysis['Name'], orient="v")
g.set_xticklabels(g.get_xticklabels(),rotation=30)
As the dataframe is quite large, am limiting the view to first 20 rows for testing.
UPDATE 1
Assuming there isn't a way to dynamically sort the axes in seaborn, this is what I want my output to look like:
Refined_ID Refined_Age Name Count_of_Refined_ID
e123 21 foo1 2
f123 19 bar1 3
z123 26 foo2 1
f123 29 bar2 3
e123 20 foo1 3
e1342 19 bar3 1
f123 20 foo3 3
From this dataframe, I would then want to plot the top two Refined_ID
s based on their count. In this case, those two categories will be e123
and f123
. The plot will have:
x-axis: Refined ID (e123 and f123)
y-axis: Refined_Age (0 to 30)
Hue: Based on Name
Is this what you want?
counts = df['Refined_ID'].value_counts()
ix = (df['Refined_ID'].apply(lambda x: counts[x])
.sort_values(ascending=False).index)
df.reindex(ix)
Refined_ID Refined_Age Name
6 f123 20 foo3
3 f123 29 bar2
1 f123 19 bar1
4 e123 20 foo1
0 e123 21 foo1
5 e1342 19 bar3
2 z123 26 foo2