Search code examples
pythonpandasplothistogramdate-histogram

Date histogram per minutes in pandas


I am trying to plot a simple date histogram(just count occurences per N minutes) using pandas. However the best I have reach is the following:

import pandas as pd
df = pd.read_csv('mydata.csv',sep=' ',usecols=[0,1])
df.columns = ['smdate', 'smtime']
df= pd.to_datetime(df['smtime'])
print(df)
df.groupby(df.dt.minute).count().plot(kind="bar",figsize=(50,10))

Sample output:

0        2020-07-07 00:00:07.538
1        2020-07-07 00:00:09.278
2        2020-07-07 00:00:09.292
3        2020-07-07 00:00:10.682
4        2020-07-07 00:00:14.198
                   ...          
262516   2020-07-07 15:54:44.056
262517   2020-07-07 15:54:44.270
262518   2020-07-07 15:54:44.450
262519   2020-07-07 15:54:44.697
262520   2020-07-07 15:54:45.210
Name: smtime, Length: 262521, dtype: datetime64[ns]

Plot enter image description here

My question is how can I group my frame best(i.e. by 10-20 minutes)? Also what is the best way to plot so many occurrences so that I can visualize them per the interval mentioned above(per N minutes)?


Solution

  • I think you are looking for pandas Grouper.
    It allows you to specify any frequency or interval needed.

    Here is a working example with 10 minutes interval :

    import pandas as pd
    df = pd.read_csv('mydata.csv',sep=';',usecols=[0,1])
    df.columns = ['smdate', 'smtime']
    
    df['smtime'] = pd.to_datetime(df['smtime'])
    
    df.groupby(pd.Grouper(key='smtime', freq='10Min')).count().plot(kind="bar",figsize=(50,10))
    

    Here, I kept the initial dataframe structure ; I couldn't get it to work with the datetime Series object (Grouper function tries to work on index and not values of the serie). I tried axis parameter without success. I would be glad if anyone could improve my answer working directly with the Series.

    Not working example :

    import pandas as pd
    df = pd.read_csv('mydata.csv',sep=';',usecols=[0,1])
    df.columns = ['smdate', 'smtime']
    
    df = pd.to_datetime(df['smtime'])
    
    df.groupby(pd.Grouper(freq='10Min')).count().plot(kind="bar",figsize=(50,10))