Search code examples
rlubridate

Finding the mean time (independent of date) that an event occurs in R


I have a list of date-times corresponding to events that occurred over multiple days, and I am hoping to find the mean time that different categories of events occurred, independent of date (i.e., the mean time for events falling into category A was 11:04:42). I have a data frame that looks similar to the following:

df <- data.frame(category = c("A", "A", "B", "A", "C", "C", "B", "D", "A", "D", "D", "C"),
                    times = ymd_hms(c("2021-09-12 21:34:22", "2021-09-13 15:42:37",
                            "2021-09-16 22:36:50", "2021-09-24 09:41:00",
                            "2021-09-20 12:14:30", "2021-09-15 16:40:39",
                            "2021-09-15 09:16:39", "2021-09-14 15:50:47",
                            "2021-09-24 18:10:00", "2021-09-21 17:30:00",
                            "2021-09-14 17:43:53", "2021-09-23 19:00:00")))

I would like to find the mean time for all events in category A, but when I call something like mean(times), the output is a date and a time, whereas I would just like a time, independent of what day each event occurred.

As an example, I have tried summarizing the data frame, like so:

summary_times <- df %>%
  group_by(category) %>%
  summarize(avg_time = mean(times))

The result is "2021-09-18 06:20:06 UTC", which is not what I would like—I'm interested in generalizing to any given day, so I'm hoping for a time that does not take the dates of the individual events into account.

I have also tried taking the individual means of the hours, minutes, and seconds, and then taking the means of those individually, but I have not been successful with that, either. My first attempt looked like this:

summary_times <- df %>% 
  group_by(category) %>%
  summarize(avg_time = paste(mean(hour(times)), ":",
                             mean(minute(times)), ":",
                             mean(second(times))))

This gave me a "time" (just as a character object, which is fine with me; this is just being displayed in a table), but each of the hour, minute, and seconds had decimal remainders. This problem led me to try this next iteration:

summary_times <- df %>% 
  group_by(category) %>%
  summarize(avg_time = paste(sum(hour(times)) %/% n(), ":",
                             sum(minute(times)) %/% n() + (sum(hour(median_datetime)) %% n())*60, ":",
                             sum(second(times)) %/% n() + (sum(minute(median_datetime)) %% n())*60))

I no longer got decimal remainders on each component of the time; however, some of the components were greater than they could possibly be (e.g., a time of "15:247:130").

Any assistance in how to find this mean time in the day of events—either by pointing in the direction of a function that can perform this task, or by investigating the taking-the-mean-of-the-individual-components option—would be greatly appreciated!


Solution

  • An option is to convert to ITime and then get the mean

    library(data.table)
    library(dplyr)
    df %>%
        group_by(category) %>%
        summarise(avg_time = mean(as.ITime(times)))
    

    -output

    # A tibble: 4 × 2
      category avg_time
      <chr>    <ITime> 
    1 A        16:16:59
    2 B        15:56:44
    3 C        15:58:23
    4 D        17:01:33
    

    Or another option is to change the 'date' part to a standardized single date, and then take the mean, format to return only the 'time' part

    df %>% 
       group_by(category) %>% 
       summarise(times = format(mean(as.POSIXct(format(times, 
              '2021-09-01 %H:%M:%S'))), '%H:%M:%S'))
    # A tibble: 4 × 2
      category times   
      <chr>    <chr>   
    1 A        16:16:59
    2 B        15:56:44
    3 C        15:58:23
    4 D        17:01:33
    

    Or do this in base R

    transform(aggregate(times ~ category, 
         data = transform(df, 
              times= as.POSIXct(format(times, '2021-09-01 %H:%M:%S'))), mean),
         times = format(times, '%H:%M:%S'))
    

    -output

     category    times
    1        A 16:16:59
    2        B 15:56:44
    3        C 15:58:23
    4        D 17:01:33