I have a data frame with a datetime column that includes hourly data. I would like to group the data by site (loc column) and calculate daily values for the remaining columns.
I've tried very basic code. For example:
dfDay <- aggregate(df, list(df$DateTime), FUN=mean)
But this returns NA for the factor columns and does not group by loc.
structure(list(DateTime = structure(c(1583427600, 1583427600,
1583427600, 1583427600, 1583427600, 1583427600), class = c("POSIXct",
"POSIXt"), tzone = "GMT"), mAOD = c(73.4675, 73.6048, 73.7507,
74.0078, 74.0731, 74.1994), groundLevel = c(73.536, 73.68, 73.809,
74.018, 73.912, 74.729), rel_ground = c(-0.0685000000000002,
-0.0752000000000095, -0.0583000000000027, -0.0101999999999975,
0.161100000000005, -0.529600000000002), siteTemp = c(6.75, 6.1,
6.91, 7.52, 7.23, 5.48), baroTemp = c(7.57, 7.57, 7.57, 7.57,
7.57, 7.57), beaverEvent = structure(c(1L, 1L, 1L, 1L, 1L, 1L
), levels = c("BB", "AB", "D"), class = "factor"), mere = structure(c(2L,
2L, 2L, 2L, 2L, 2L), levels = c("chapel", "hatch", "coleCrose",
"crose"), class = "factor"), loc = c("SSSI3", "SSSI2", "B6",
"SSSI1", "B8", "B3"), spi = c(1.3363327, 1.3363327, 1.3363327,
1.3363327, 1.3363327, 1.3363327), spigroup = c("Moderately Wet",
"Moderately Wet", "Moderately Wet", "Moderately Wet", "Moderately Wet",
"Moderately Wet"), damEvent = structure(c(1L, 1L, 1L, 1L, 1L,
1L), levels = c("1", "2"), class = "factor"), year = structure(c(1L,
1L, 1L, 1L, 1L, 1L), levels = c("1", "2", "3"), class = "factor")), row.names = c(NA,
6L), class = "data.frame")
I have done it this way:
Check if the output is what you want:
library(dplyr)
library(lubridate)
dfDay <- df %>% mutate(only_date = date(DateTime)) %>% group_by(loc, only_date) %>% summarise_if(is.numeric, mean, na.rm=T)
## A tibble: 6 × 8
## Groups: loc [6]
# loc only_date mAOD groundLevel rel_ground siteTemp baroTemp spi
# <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 B3 2020-03-05 74.2 74.7 -0.530 5.48 7.57 1.34
#2 B6 2020-03-05 73.8 73.8 -0.0583 6.91 7.57 1.34
#3 B8 2020-03-05 74.1 73.9 0.161 7.23 7.57 1.34
#4 SSSI1 2020-03-05 74.0 74.0 -0.0102 7.52 7.57 1.34
#5 SSSI2 2020-03-05 73.6 73.7 -0.0752 6.1 7.57 1.34
#6 SSSI3 2020-03-05 73.5 73.5 -0.0685 6.75 7.57 1.34
if you want to keep the factor columns as well:
dfDay <- df %>% mutate(only_date = date(DateTime)) %>% group_by(loc, only_date) %>% summarise(across(where(is.numeric), ~ mean(.x, na.rm=T)), across(where(is.factor), ~ first(.x)), .groups='keep')
dfDay
## A tibble: 6 × 12
## Groups: loc, only_date [6]
# loc only_date mAOD groundLevel rel_ground siteTemp baroTemp spi beaverEvent mere damEvent year
# <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <fct> <fct> <fct> <fct>
#1 B3 2020-03-05 74.2 74.7 -0.530 5.48 7.57 1.34 BB hatch 1 1
#2 B6 2020-03-05 73.8 73.8 -0.0583 6.91 7.57 1.34 BB hatch 1 1
#3 B8 2020-03-05 74.1 73.9 0.161 7.23 7.57 1.34 BB hatch 1 1
#4 SSSI1 2020-03-05 74.0 74.0 -0.0102 7.52 7.57 1.34 BB hatch 1 1
#5 SSSI2 2020-03-05 73.6 73.7 -0.0752 6.1 7.57 1.34 BB hatch 1 1
#6 SSSI3 2020-03-05 73.5 73.5 -0.0685 6.75 7.57 1.34 BB hatch 1 1