Search code examples
pythonmatplotlibdatasetseabornhistogram

python histogram, looking messy and uneven


I've a list of number the size of the list is "74004228" and the minimum value is "1" with maximum value "65852", I'm trying to get sense of how their distribution will look like, so I'm using plt.hist() to plot the histogram, but it doesn't give me anything.

I'm getting the following histogram which looks messy. histogram

matplotlib code:

unique_values = sorted(set(dataset_length))
bin_edges = unique_values + [unique_values[-1] + 1]

plt.hist(dataset_length, bins=bin_edges, log=True)  # Align bins to the left
plt.title('Histogram of Data')
plt.xlabel('Values')
plt.ylabel('Frequency')
plt.show()

and I've tried

sns.displot(dataset_length)

the sns.distplot gives me an empty plot as below: sns plot

any solution for this ?


Solution

  • I took the log of the data, it looks better but then I can not know the exact distribution for each index.

    So what I did: I used dict to count the values as follows

    count_dataset_dict = {}
    for item in dataset_length:
      if item in count_dataset_dict:
        count_dataset_dict[item] += 1
      else:
        count_dataset_dict[item] = 1
    

    convert that to pandas DF and then filtered threshold of 1000

    count_df_more_1K = count_df[count_df['count']>1000]
    

    now I plotted the data

    count_df_more_1K.sort_index().plot(kind='bar', figsize=(18, 8))
    plt.title("Distribution of Index")
    plt.xlabel("Index")
    plt.ylabel("Count")
    

    which looks something as follows

    plot