I have a dataframe df that looks like this:
time | Hit/Miss |
---|---|
2016-09-29 08:00:00 | FN |
2016-09-29 08:30:00 | FN |
2016-09-29 09:45:00 | TP |
2016-10-05 14:00:00 | FP |
time
is imported straight from a csv file without being set as an index, and Hit/Miss
contains three categorical values FN
, TP
, and FP
.
Now, I'd like to group the data by hour, day, and month and count how many FN
, TP
, FP
's occurred during the hour. As you can see from the brief example dataset. Now, every hour has Hit/Miss
values.
This is what I have so far after grouping by day, which does give me the desired output:
df['time'] = pd.to_datetime(df['time'])
df['year'] = df['time'].dt.year
df['month'] = df['time'].dt.month
df['day'] = df['time'].dt.day
HM_DD = pd.crosstab([df['year'],df['month'],df['day']], df['Hit/Miss'], dropna=True)
HM_MM = pd.crosstab([df['year'],df['month']], df['Hit/Miss'], dropna=True)
HM_YYMM = pd.crosstab(df['year'], df['Hit/Miss'], dropna=True)
HM_DD = HM_DD.reset_index()
And the output looks like the snip shot desired output here:
It does show what I need to see; however, I feel like there's a better way to do it. After all, I need to visualize the data by day and year (extra qn if you have time: What would be the best way/chart to visualize it?), so I'd like to keep the time intact for better maneuver later on, not like how I separated them into the year, month, day. I have tried setting time as an index and resample()
, but that didn't work for me.
I would appreciate the help!
You can use offsets
to aggregate your data:
HM_DD = pd.crosstab(df['time'].dt.normalize(), df['Hit/Miss'])
HM_MM = pd.crosstab(df['time'].dt.date+pd.offsets.MonthBegin(-1), df['Hit/Miss'])
HM_YYMM = pd.crosstab(df['time'].dt.date+pd.offsets.YearBegin(-1), df['Hit/Miss'])
Output:
>>> HM_DD.reset_index().rename_axis(columns=None)
time FN FP TP
0 2016-09-29 2 0 1
1 2016-10-05 0 1 0
>>> HM_MM.reset_index().rename_axis(columns=None)
time FN FP TP
0 2016-09-01 2 0 1
1 2016-10-01 0 1 0
>>> HM_YYMM.reset_index().rename_axis(columns=None)
time FN FP TP
0 2016-01-01 2 1 1