Search code examples
rdataframetimegroup-bylubridate

How to find the mean of a time column and group it using r?


I have a data frame containing a column named ride_length which is already in hh:mm:ss format. I would like to calculate the mean from that column and group it via its two category: member and casual (found in the member_casual column).

I have tried this pipeline with the lubridate library:

df %>%
  group_by(member_casual) %>%
  seconds_to_period(mean(period_to_seconds(hms(ride_length))))

Even if my arguments are the same as other examples found online, I still get this message:

Error in seconds_to_period(., mean(period_to_seconds(hms(ride_length)))) : unused argument (mean(period_to_seconds(hms(ride_length))))

I have also tried a longer path by doing this:

df$nride_length <- difftime(strptime(df$ride_length,"%H:%M:%S"),
                     strptime("00:00:00","%H:%M:%S"),
                     units="mins")
df.means <- aggregate(df$nride_length,by=list(df$member_casual),mean)
df.means$ride_length <- format(.POSIXct(df.means$x,tz="GMT"), "%H:%M:%S")
df.means

But the result is still problematic:

Group.1 x ride_length 1 casual NA mins 2 member NA mins

I have also tried with summarise:

df %>%
  group_by(member_casual) %>%
  summarise(length_mean = seconds_to_period(mean(period_to_seconds(hms(ride_length)))))

But then this shows:

# A tibble: 2 × 2
  member_casual length_mean
  <chr>         <Period>   
1 casual        NA         
2 member        NA         

Warning message:
There were 2 warnings in `summarise()`.
The first warning was:
ℹ In argument: `length_mean =
  seconds_to_period(mean(period_to_seconds(hms(ride_length))))`.
ℹ In group 1: `member_casual = "casual"`.
Caused by warning in `.parse_hms()`:
! Some strings failed to parse, or all strings are NAs
ℹ Run dplyr::last_dplyr_warnings() to see the 1 remaining warning. 

Please help


Solution

  • You can use aggregate() alone. Specify the grouping as you'd do with an anova. I changed the data.frame a little so there are three of both "member" and "casual".

    dtf <- structure(list(rideable_type=c("electric_bike",
      "classic_bike", "classic_bike", "electric_bike",
      "classic_bike", "classic_bike"), day_of_week=c(1, 1, 1, 6, 7,
      2), ride_length=structure(c(990, 810, 576, 296, 686, 294),
      class=c("hms", "difftime"), units="secs"),
      member_casual=c("member", "member", "member", "casual",
      "casual", "casual"), nride_length=structure(c(16.5, 13.5, 9.6,
      4.93, 11.43, 4.9), class="difftime", units="mins")),
      row.names=c(NA, -6L), class=c("tbl_df", "tbl", "data.frame"))
        
    aggregate(ride_length ~ member_casual, data=dtf, mean)
      #   member_casual    ride_length
      # 1        casual 425.33333 secs
      # 2        member 792.00000 secs