Search code examples
pythonpandasmatplotlibkaggle

Using matplotlib to obtain an overlaid histogram


I am new to python and I'm trying to plot an overlaid histogram for a manipulated data set from Kaggle. I tried doing it with matplotlib. This is a dataset that shows the history of gun violence in USA in recent years. I have selected only few columns for EDA.

 import pandas as pd

 data_set = pd.read_csv("C:/Users/Lenovo/Documents/R related 
 Topics/Assignment/Assignment_day2/04 Assignment/GunViolence.csv")
 state_wise_crime = data_set[['date', 'state', 'n_killed', 'n_injured']]

 date_value = pd.to_datetime(state_wise_crime['date'])

 import datetime

 state_wise_crime['Month']= date_value.dt.month
 state_wise_crime.drop('date', axis = 1)

 no_of_killed = state_wise_crime.groupby(['state','Year']) 
 ['n_killed','n_injured'].sum()

 no_of_killed = state_wise_crime.groupby(['state','Year'] 
 ['n_killed','n_injured'].sum()

I want an overlaid histogram that shows the no. of people killed and no.of people injured with the different states on the x-axis


Solution

  • Welcome to Stack Overflow! From next time, please post your data like in below format (not a link or an image) to make us easier to work on the problem. Also, if you ask about a graph output, showing the contents of desired graph (even with hand drawing) would be very helpful :)


    df

        state   Year    n_killed    n_injured
    0   Alabama 2013    9           3
    1   Alabama 2014    591         325
    2   Alabama 2015    562         385
    3   Alabama 2016    761         488
    4   Alabama 2017    856         544
    5   Alabama 2018    219         135
    6   Alaska  2014    49          29
    7   Alaska  2015    84          70
    8   Alaska  2016    103         88
    9   Alaska  2017    70          69
    

    As I commented in your original post, a bar plot would be more appropriate than histogram in this case since your purpose appears to be visualizing the summary statistics (sum) of each year with state-wise comparison. As far as I know, the easiest option is to use Seaborn. It depends on how you want to show the data, but below is one example. The code is as simple as below.

    import seaborn as sns    
    sns.barplot(x='Year', y='n_killed', hue='state', data=df)
    

    Output:

    enter image description here

    Hope this helps.