Search code examples
pythonplotseaborndata-cleaning

How to keep only some values in a column and plot them use Python


I have a dataframe with 180000 rows and multiple columns. One column includes names of forty different cities,"home_state". Only four cities are frequently appear and the others appear rarely . When I plot with the following code, it doesn't look good because most of the cities have very few customers and I don't need them really to be on the plot.

from matplotlib.pyplot import figure
figure(num=None, figsize=(8,10), dpi=80, facecolor='w', edgecolor='r')
sns.countplot(y = 'home_state', data = df)
plt.title('Total Number of Customers',size=18)
plt.ylabel('home_state',size=14)
plt.show()

So, my question is how can I only plot those four cities that are highly repeated? Sorry for not being able to share the data.

Histogram of only four cities.


Solution

  • You can set a lower limit so that only cities that have more than a set number of customers (eg. 10000) will be plotted.

    plot_df = df[df.no_of_customers > 10000]
    sns.countplot(y = 'home_state', data = data)