I have a dataframe 'df1' that has a posixct column 'Date' and other data columns with values associated with dates and times in 'Date'. The date look like this:
Date | Value1 | Value2 |
---|---|---|
2022-03-14 13:00:00 | 1 | 3 |
2022-03-14 13:10:00 | 2 | 4 |
2022-03-14 13:20:00 | 3 | 5 |
2022-03-14 13:30:00 | 4 | 6 |
2022-03-14 13:40:00 | 5 | 7 |
2022-03-14 13:50:00 | 6 | 8 |
2022-03-14 14:00:00 | 10 | 40 |
2022-03-14 14:10:00 | 20 | 50 |
2022-03-14 14:20:00 | 30 | 60 |
2022-03-14 14:30:00 | 40 | 70 |
2022-03-14 14:40:00 | 50 | 80 |
2022-03-14 14:50:00 | 60 | 90 |
I would like to average the values in 'Value1' and 'Value2' over all instances of each days hour and create a new dataframe 'df2' where 'Date' is now the start of each hour for each day and 'Value1' is the average values. The resulting df2 would look like:
Date | Value1 | Value2 |
---|---|---|
2022-03-14 13:00:00 | 3.5 | 5.5 |
2022-03-14 14:00:00 | 35 | 65 |
You can use floor_date()
from lubridate
to round the date-time object down to the nearest boundary of hours.
library(dplyr)
library(lubridate)
df %>%
group_by(Date = floor_date(Date, "hour")) %>%
summarise(across(contains("Value"), mean))
# # A tibble: 2 × 3
# Date Value1 Value2
# <dttm> <dbl> <dbl>
# 1 2022-03-14 13:00:00 3.5 5.5
# 2 2022-03-14 14:00:00 35 65
df <- read.csv(text = "Date, Value1, Value2
2022-03-14 13:00:00, 1, 3
2022-03-14 13:10:00, 2, 4
2022-03-14 13:20:00, 3, 5
2022-03-14 13:30:00, 4, 6
2022-03-14 13:40:00, 5, 7
2022-03-14 13:50:00, 6, 8
2022-03-14 14:00:00, 10, 40
2022-03-14 14:10:00, 20, 50
2022-03-14 14:20:00, 30, 60
2022-03-14 14:30:00, 40, 70
2022-03-14 14:40:00, 50, 80
2022-03-14 14:50:00, 60, 90", colClasses = c(Date = "POSIXct"))