I have a dataframe with 180000 rows and multiple columns. One column includes names of forty different cities,"home_state". Only four cities are frequently appear and the others appear rarely . When I plot with the following code, it doesn't look good because most of the cities have very few customers and I don't need them really to be on the plot.
from matplotlib.pyplot import figure
figure(num=None, figsize=(8,10), dpi=80, facecolor='w', edgecolor='r')
sns.countplot(y = 'home_state', data = df)
plt.title('Total Number of Customers',size=18)
plt.ylabel('home_state',size=14)
plt.show()
So, my question is how can I only plot those four cities that are highly repeated? Sorry for not being able to share the data.
Histogram of only four cities.
You can set a lower limit so that only cities that have more than a set number of customers (eg. 10000) will be plotted.
plot_df = df[df.no_of_customers > 10000]
sns.countplot(y = 'home_state', data = data)