Search code examples
pythondatatabledata-sciencehistogram

Histogram from Table


Table

Hi, I'm trying to make a histogram with above table, and below is my coding.

def histograms(t):
    salaries = t.column('Salary')
    salary_bins = np.arange(min(salaries), max(salaries)+1000, 1000)
    t.hist('Salary', bins=salary_bins, unit='$')
    
histograms(full_data)

But it's not showing properly. Can you help me?

Histogram


Solution

  • The bins argument in a histogram specifies the number of bins into which the data will be evenly distributed.

    Let's say you have a sample dataframe of salaries like this:

    import pandas as pd
    
    sample_dataframe = pd.DataFrame({'name':['joe','jill','martin','emily','frank','john','sue','sally','sam'],
                                     'salary':[105324,65002,98314,24480,55000,62000,75000,79000,32000]})
    
    #output:
         name  salary
    0     joe  105324
    1    jill   65002
    2  martin   98314
    3   emily   24480
    4   frank   55000
    5    john   62000
    6     sue   75000
    7   sally   79000
    8     sam   32000
    

    If you want to plot a histogram where the salaries will be distributed in 10 bins and you want to stick with your function, you can do:

    import matplotlib.pyplot as plt
    
    def histograms(t):
        plt.hist(t.salary, bins = 10, color = 'orange', edgecolor = 'black')
        plt.xlabel('Salary')
        plt.ylabel('Count')
        plt.show()
    
    histograms(sample_dataframe)
    

    enter image description here If you want the x-axis ticks to reflect the boundaries of the 10 bins, you can add this line:

    import numpy as np
    plt.xticks(np.linspace(min(t.salary), max(t.salary), 11), rotation = 45)
    

    enter image description here Finally to show the y-ticks as integers, you add these lines:

    from matplotlib.ticker import MaxNLocator
    plt.gca().yaxis.set_major_locator(MaxNLocator(integer=True))
    

    The final function looks like this:

    def histograms(t):
        plt.hist(t.salary, bins = 10, color = 'orange', edgecolor = 'black')
        plt.xlabel('Salary')
        plt.ylabel('Count')
    
        plt.gca().yaxis.set_major_locator(MaxNLocator(integer=True))
        plt.xticks(np.linspace(min(t.salary), max(t.salary), 11), rotation = 45)
        plt.show()
    

    enter image description here