Search code examples
rmeanposixct

R Average values of posixct rows by hour


I have a dataframe 'df1' that has a posixct column 'Date' and other data columns with values associated with dates and times in 'Date'. The date look like this:

Date Value1 Value2
2022-03-14 13:00:00 1 3
2022-03-14 13:10:00 2 4
2022-03-14 13:20:00 3 5
2022-03-14 13:30:00 4 6
2022-03-14 13:40:00 5 7
2022-03-14 13:50:00 6 8
2022-03-14 14:00:00 10 40
2022-03-14 14:10:00 20 50
2022-03-14 14:20:00 30 60
2022-03-14 14:30:00 40 70
2022-03-14 14:40:00 50 80
2022-03-14 14:50:00 60 90

I would like to average the values in 'Value1' and 'Value2' over all instances of each days hour and create a new dataframe 'df2' where 'Date' is now the start of each hour for each day and 'Value1' is the average values. The resulting df2 would look like:

Date Value1 Value2
2022-03-14 13:00:00 3.5 5.5
2022-03-14 14:00:00 35 65

Solution

  • You can use floor_date() from lubridate to round the date-time object down to the nearest boundary of hours.

    library(dplyr)
    library(lubridate)
    
    df %>%
      group_by(Date = floor_date(Date, "hour")) %>%
      summarise(across(contains("Value"), mean))
    
    # # A tibble: 2 × 3
    #   Date                Value1 Value2
    #   <dttm>               <dbl>  <dbl>
    # 1 2022-03-14 13:00:00    3.5    5.5
    # 2 2022-03-14 14:00:00   35     65
    

    Data
    df <- read.csv(text = "Date, Value1, Value2
    2022-03-14 13:00:00, 1, 3
    2022-03-14 13:10:00, 2, 4
    2022-03-14 13:20:00, 3, 5
    2022-03-14 13:30:00, 4, 6
    2022-03-14 13:40:00, 5, 7
    2022-03-14 13:50:00, 6, 8
    2022-03-14 14:00:00, 10, 40
    2022-03-14 14:10:00, 20, 50
    2022-03-14 14:20:00, 30, 60
    2022-03-14 14:30:00, 40, 70
    2022-03-14 14:40:00, 50, 80
    2022-03-14 14:50:00, 60, 90", colClasses = c(Date = "POSIXct"))