Search code examples
pandasdataframegroup-bydata-sciencedata-munging

Pandas histogram of number of occurences of other columns after groupby


I have a dataframe:

df = Batch_ID           DateTime            Code A1 A2
      ABC.      '2019-01-02 17:03:41.000'   230  2. 4 
      ABC.      '2019-01-02 17:03:41.000'   230  1. 5 
      ABC.      '2019-01-02 17:03:42.000'   231  1. 4 
      ABC.      '2019-01-02 17:03:48.000'   232  2. 7 
      ABC.      '2019-01-02 17:04:41.000'   230  2. 9 
      ABB.      '2019-01-02 17:04:41.000'   235  5. 4 
      ABB.      '2019-01-02 17:04:45.000'   236  2. 0 

I need to generate an plot of an histogram of "number of different codes per <Batch_ID, minute>. Notice that 'Code' may have multiple occurrences but should be taken after unique.

So in this case some entries will be:

<ABC, 2019-01-02 17:03> : 3
<ABC, 2019-01-02 17:04> : 1
<ABB, 2019-01-02 17:04> : 2

How can it be done?


Solution

  • Try this using pd.Grouper on a datetime dtype column:

    df = pd.read_clipboard(sep='\s\s+')
    
    df['DateTime'] = pd.to_datetime(df['DateTime'].str.strip("'"))
    
    df.groupby(['Batch_ID', pd.Grouper(key='DateTime', freq='T')])['Code'].count().rename('Count').reset_index()
    

    Output:

      Batch_ID            DateTime  Count
    0     ABB. 2019-01-02 17:04:00      2
    1     ABC. 2019-01-02 17:03:00      3
    2     ABC. 2019-01-02 17:04:00      1