I have a data frame that looks like this:
structure(list(datetime = structure(c(1559347200, 1559350800,
1559354400, 1559358000, 1559361600, 1559365200, 1559368800, 1559372400,
1559376000, 1559379600), tzone = "UTC", class = c("POSIXct",
"POSIXt")), BUSwt = c(77.8154, 77.2538, 76.9325, 76.81235, 76.6544,
76.3403, 76.16165, 76.22375, 76.47035, 76.74395), MBwt = c(78.3653,
78.53495, 78.6074, 78.30725, 78.0368, 77.9477, 77.8946, 77.927,
78.12455, 78.23615), SUSwt = c(76.2575, 76.0055, 75.7265, 75.425,
75.137, 74.849, 74.651, 74.624, 74.696, 74.8535)), row.names = c(NA,
-10L), class = c("tbl_df", "tbl", "data.frame"))
It has 5 years of datetime data for the three variable columns "BUSwt", "MBwt", and "SUSwt". I want to calculate the mean, min, and max of of these variable for each specific date and hour over the course of the 5 years of data. An example of this data would look like:
datetime | BUSwt_mean | BUSwt_max | BUSwt_min | MBwt_mean | MBwt_max | MBwt_min | SUSwt_mean | SUSwt_max | SUSwt_min |
---|---|---|---|---|---|---|---|---|---|
06-01 00:00:00 | xx | xx | xx | xx | xx | xx | xx | xx | xx |
06-01 01:00:00 | xx | xx | xx | xx | xx | xx | xx | xx | xx |
06-01 02:00:00 | xx | xx | xx | xx | xx | xx | xx | xx | xx |
06-01 03:00:00 | xx | xx | xx | xx | xx | xx | xx | xx | xx |
06-01 04:00:00 | xx | xx | xx | xx | xx | xx | xx | xx | xx |
My current code to make the data frame looks like this:
WaterData <- BUSdata %>%
left_join(MBdata, by = "datetime") %>%
left_join(SUSdata, by = "datetime")
Is there a way to do this continuing in a pipe?
Another way to get what you need, using dplyr
's handy across()
function:
library(tidyverse)
df %>%
mutate(month = month(datetime),
day = day(datetime),
hour = hour(datetime)) %>%
group_by(month, day, hour) %>%
summarise(across(.cols = c("BUSwt", "MBwt", "SUSwt"),
.fns = list("min" = min, "mean" = mean, "max" = max))) %>%
ungroup()
This solution separates month, day, and hour into their own columns. If you really need a date format (e.g., for plotting purposes), then you can coerce everything to the same year (doesn't matter what year it is), like this:
df %>%
mutate(datetime = `year<-`(datetime, 2000)) %>%
group_by(datetime) %>%
summarise(across(.cols = c("BUSwt", "MBwt", "SUSwt"),
.fns = list("min" = min, "mean" = mean, "max" = max))) %>%
ungroup()