I am new to python and I'm trying to plot an overlaid histogram for a manipulated data set from Kaggle
. I tried doing it with matplotlib
. This is a dataset that shows the history of gun violence in USA in recent years. I have selected only few columns for EDA
.
import pandas as pd
data_set = pd.read_csv("C:/Users/Lenovo/Documents/R related
Topics/Assignment/Assignment_day2/04 Assignment/GunViolence.csv")
state_wise_crime = data_set[['date', 'state', 'n_killed', 'n_injured']]
date_value = pd.to_datetime(state_wise_crime['date'])
import datetime
state_wise_crime['Month']= date_value.dt.month
state_wise_crime.drop('date', axis = 1)
no_of_killed = state_wise_crime.groupby(['state','Year'])
['n_killed','n_injured'].sum()
no_of_killed = state_wise_crime.groupby(['state','Year']
['n_killed','n_injured'].sum()
Welcome to Stack Overflow! From next time, please post your data like in below format (not a link or an image) to make us easier to work on the problem. Also, if you ask about a graph output, showing the contents of desired graph (even with hand drawing) would be very helpful :)
df
state Year n_killed n_injured
0 Alabama 2013 9 3
1 Alabama 2014 591 325
2 Alabama 2015 562 385
3 Alabama 2016 761 488
4 Alabama 2017 856 544
5 Alabama 2018 219 135
6 Alaska 2014 49 29
7 Alaska 2015 84 70
8 Alaska 2016 103 88
9 Alaska 2017 70 69
As I commented in your original post, a bar plot would be more appropriate than histogram in this case since your purpose appears to be visualizing the summary statistics (sum) of each year with state-wise comparison. As far as I know, the easiest option is to use Seaborn. It depends on how you want to show the data, but below is one example. The code is as simple as below.
import seaborn as sns
sns.barplot(x='Year', y='n_killed', hue='state', data=df)
Output:
Hope this helps.