pandas dataframe group-by data-science data-munging

Pandas histogram of number of occurences of other columns after groupby

I have a dataframe:

df = Batch_ID           DateTime            Code A1 A2
      ABC.      '2019-01-02 17:03:41.000'   230  2. 4 
      ABC.      '2019-01-02 17:03:41.000'   230  1. 5 
      ABC.      '2019-01-02 17:03:42.000'   231  1. 4 
      ABC.      '2019-01-02 17:03:48.000'   232  2. 7 
      ABC.      '2019-01-02 17:04:41.000'   230  2. 9 
      ABB.      '2019-01-02 17:04:41.000'   235  5. 4 
      ABB.      '2019-01-02 17:04:45.000'   236  2. 0

I need to generate an plot of an histogram of "number of different codes per <Batch_ID, minute>. Notice that 'Code' may have multiple occurrences but should be taken after unique.

So in this case some entries will be:

<ABC, 2019-01-02 17:03> : 3
<ABC, 2019-01-02 17:04> : 1
<ABB, 2019-01-02 17:04> : 2

How can it be done?

Solution

Try this using pd.Grouper on a datetime dtype column:

df = pd.read_clipboard(sep='\s\s+')

df['DateTime'] = pd.to_datetime(df['DateTime'].str.strip("'"))

df.groupby(['Batch_ID', pd.Grouper(key='DateTime', freq='T')])['Code'].count().rename('Count').reset_index()

Output:

  Batch_ID            DateTime  Count
0     ABB. 2019-01-02 17:04:00      2
1     ABC. 2019-01-02 17:03:00      3
2     ABC. 2019-01-02 17:04:00      1