I am learning to use Pandas library and need to perform analysis and plot the crime data set below. Each row represents one occurrence of crime. date_rep column contains daily dates for a year.
Data needs to be grouped by month and instances of specific crime need to be added up per month, like in the table below.
The problem I am running into is that data in crime column is qualitative and I just cant find resources online that can help me solve this!
I have been reading up on groupby and different methods of sorting but what is the most efficient way of accomplishing this? Thank you in advance!
To replicate something of your data:
In [29]: df = pd.DataFrame({'date_rep':pd.date_range('2012-01-01', periods=100),
...: 'crm_cd_desc':np.random.choice(['robbery', 'traffic', 'assault'], size=100)})
In [30]: df.head()
Out[30]:
crm_cd_desc date_rep
0 traffic 2012-01-01
1 traffic 2012-01-02
2 assault 2012-01-03
3 robbery 2012-01-04
In essence, what you want to do is a value counts:
In [31]: df['crm_cd_desc'].value_counts()
Out[31]:
assault 36
traffic 34
robbery 30
dtype: int64
However, you want to do this for each month seperately. To group by month, you can use pd.Grouper
inside groupby
to specify the month:
In [34]: df.groupby(pd.Grouper(key='date_rep', freq='M'))['crm_cd_desc'].value_counts()
Out[34]:
date_rep
2012-01-31 traffic 12
robbery 10
assault 9
2012-02-29 assault 13
traffic 11
robbery 5
2012-03-31 assault 12
robbery 10
traffic 9
2012-04-30 robbery 5
assault 2
traffic 2
dtype: int64
And then unstack
to get the result:
In [35]: df.groupby(pd.Grouper(key='date_rep', freq='M'))['crm_cd_desc'].value_counts().unstack()
Out[35]:
assault robbery traffic
date_rep
2012-01-31 9 10 12
2012-02-29 13 5 11
2012-03-31 12 10 9
2012-04-30 2 5 2
Instead of using value_counts
, you can also group by both the month and the crime type and then calculate the length of each group:
In [46]: df.groupby([pd.Grouper(key='date_rep', freq='M'), 'crm_cd_desc']).size().unstack()
Out[46]:
crm_cd_desc assault robbery traffic
date_rep
2012-01-31 9 10 12
2012-02-29 13 5 11
2012-03-31 12 10 9
2012-04-30 2 5 2