I have a list of date-times corresponding to events that occurred over multiple days, and I am hoping to find the mean time that different categories of events occurred, independent of date (i.e., the mean time for events falling into category A was 11:04:42). I have a data frame that looks similar to the following:
df <- data.frame(category = c("A", "A", "B", "A", "C", "C", "B", "D", "A", "D", "D", "C"),
times = ymd_hms(c("2021-09-12 21:34:22", "2021-09-13 15:42:37",
"2021-09-16 22:36:50", "2021-09-24 09:41:00",
"2021-09-20 12:14:30", "2021-09-15 16:40:39",
"2021-09-15 09:16:39", "2021-09-14 15:50:47",
"2021-09-24 18:10:00", "2021-09-21 17:30:00",
"2021-09-14 17:43:53", "2021-09-23 19:00:00")))
I would like to find the mean time for all events in category A, but when I call something like mean(times)
, the output is a date and a time, whereas I would just like a time, independent of what day each event occurred.
As an example, I have tried summarizing the data frame, like so:
summary_times <- df %>%
group_by(category) %>%
summarize(avg_time = mean(times))
The result is "2021-09-18 06:20:06 UTC"
, which is not what I would like—I'm interested in generalizing to any given day, so I'm hoping for a time that does not take the dates of the individual events into account.
I have also tried taking the individual means of the hours, minutes, and seconds, and then taking the means of those individually, but I have not been successful with that, either. My first attempt looked like this:
summary_times <- df %>%
group_by(category) %>%
summarize(avg_time = paste(mean(hour(times)), ":",
mean(minute(times)), ":",
mean(second(times))))
This gave me a "time" (just as a character object, which is fine with me; this is just being displayed in a table), but each of the hour, minute, and seconds had decimal remainders. This problem led me to try this next iteration:
summary_times <- df %>%
group_by(category) %>%
summarize(avg_time = paste(sum(hour(times)) %/% n(), ":",
sum(minute(times)) %/% n() + (sum(hour(median_datetime)) %% n())*60, ":",
sum(second(times)) %/% n() + (sum(minute(median_datetime)) %% n())*60))
I no longer got decimal remainders on each component of the time; however, some of the components were greater than they could possibly be (e.g., a time of "15:247:130"
).
Any assistance in how to find this mean time in the day of events—either by pointing in the direction of a function that can perform this task, or by investigating the taking-the-mean-of-the-individual-components option—would be greatly appreciated!
An option is to convert to ITime
and then get the mean
library(data.table)
library(dplyr)
df %>%
group_by(category) %>%
summarise(avg_time = mean(as.ITime(times)))
-output
# A tibble: 4 × 2
category avg_time
<chr> <ITime>
1 A 16:16:59
2 B 15:56:44
3 C 15:58:23
4 D 17:01:33
Or another option is to change the 'date' part to a standardized single date, and then take the mean
, format
to return only the 'time' part
df %>%
group_by(category) %>%
summarise(times = format(mean(as.POSIXct(format(times,
'2021-09-01 %H:%M:%S'))), '%H:%M:%S'))
# A tibble: 4 × 2
category times
<chr> <chr>
1 A 16:16:59
2 B 15:56:44
3 C 15:58:23
4 D 17:01:33
Or do this in base R
transform(aggregate(times ~ category,
data = transform(df,
times= as.POSIXct(format(times, '2021-09-01 %H:%M:%S'))), mean),
times = format(times, '%H:%M:%S'))
-output
category times
1 A 16:16:59
2 B 15:56:44
3 C 15:58:23
4 D 17:01:33