I am trying to plot a simple date histogram(just count occurences per N minutes) using pandas. However the best I have reach is the following:
import pandas as pd
df = pd.read_csv('mydata.csv',sep=' ',usecols=[0,1])
df.columns = ['smdate', 'smtime']
df= pd.to_datetime(df['smtime'])
print(df)
df.groupby(df.dt.minute).count().plot(kind="bar",figsize=(50,10))
Sample output:
0 2020-07-07 00:00:07.538
1 2020-07-07 00:00:09.278
2 2020-07-07 00:00:09.292
3 2020-07-07 00:00:10.682
4 2020-07-07 00:00:14.198
...
262516 2020-07-07 15:54:44.056
262517 2020-07-07 15:54:44.270
262518 2020-07-07 15:54:44.450
262519 2020-07-07 15:54:44.697
262520 2020-07-07 15:54:45.210
Name: smtime, Length: 262521, dtype: datetime64[ns]
My question is how can I group my frame best(i.e. by 10-20 minutes)? Also what is the best way to plot so many occurrences so that I can visualize them per the interval mentioned above(per N minutes)?
I think you are looking for pandas Grouper.
It allows you to specify any frequency or interval needed.
Here is a working example with 10 minutes interval :
import pandas as pd
df = pd.read_csv('mydata.csv',sep=';',usecols=[0,1])
df.columns = ['smdate', 'smtime']
df['smtime'] = pd.to_datetime(df['smtime'])
df.groupby(pd.Grouper(key='smtime', freq='10Min')).count().plot(kind="bar",figsize=(50,10))
Here, I kept the initial dataframe structure ; I couldn't get it to work with the datetime Series object (Grouper function tries to work on index and not values of the serie). I tried axis parameter without success. I would be glad if anyone could improve my answer working directly with the Series.
Not working example :
import pandas as pd
df = pd.read_csv('mydata.csv',sep=';',usecols=[0,1])
df.columns = ['smdate', 'smtime']
df = pd.to_datetime(df['smtime'])
df.groupby(pd.Grouper(freq='10Min')).count().plot(kind="bar",figsize=(50,10))