Search code examples
pythonpandasdata-analysis

Counting qualitative values based on the date range in Pandas


I am learning to use Pandas library and need to perform analysis and plot the crime data set below. Each row represents one occurrence of crime. date_rep column contains daily dates for a year.

enter image description here

Data needs to be grouped by month and instances of specific crime need to be added up per month, like in the table below.

enter image description here

The problem I am running into is that data in crime column is qualitative and I just cant find resources online that can help me solve this!

I have been reading up on groupby and different methods of sorting but what is the most efficient way of accomplishing this? Thank you in advance!


Solution

  • To replicate something of your data:

    In [29]: df = pd.DataFrame({'date_rep':pd.date_range('2012-01-01', periods=100),
        ...:                    'crm_cd_desc':np.random.choice(['robbery', 'traffic', 'assault'], size=100)})
    
    
    In [30]: df.head()
    Out[30]: 
      crm_cd_desc   date_rep
    0     traffic 2012-01-01
    1     traffic 2012-01-02
    2     assault 2012-01-03
    3     robbery 2012-01-04
    

    In essence, what you want to do is a value counts:

    In [31]: df['crm_cd_desc'].value_counts()
    Out[31]: 
    assault    36
    traffic    34
    robbery    30
    dtype: int64
    

    However, you want to do this for each month seperately. To group by month, you can use pd.Grouper inside groupby to specify the month:

    In [34]: df.groupby(pd.Grouper(key='date_rep', freq='M'))['crm_cd_desc'].value_counts()
    Out[34]: 
    date_rep           
    2012-01-31  traffic    12
                robbery    10
                assault     9
    2012-02-29  assault    13
                traffic    11
                robbery     5
    2012-03-31  assault    12
                robbery    10
                traffic     9
    2012-04-30  robbery     5
                assault     2
                traffic     2
    dtype: int64
    

    And then unstack to get the result:

    In [35]: df.groupby(pd.Grouper(key='date_rep', freq='M'))['crm_cd_desc'].value_counts().unstack()
    Out[35]: 
                assault  robbery  traffic
    date_rep                             
    2012-01-31        9       10       12
    2012-02-29       13        5       11
    2012-03-31       12       10        9
    2012-04-30        2        5        2
    

    Instead of using value_counts, you can also group by both the month and the crime type and then calculate the length of each group:

    In [46]: df.groupby([pd.Grouper(key='date_rep', freq='M'), 'crm_cd_desc']).size().unstack()
    Out[46]: 
    crm_cd_desc  assault  robbery  traffic
    date_rep                              
    2012-01-31        9       10       12
    2012-02-29       13        5       11
    2012-03-31       12       10        9
    2012-04-30        2        5        2