My dataframe is from a high-frequency buoy taking measurements at discrete depths in a lake.
library(rLakeAnalyzer)
library(tidyverse)
library(plyr)
PFL_counter = c(2,2,2,3,3,3,4,4)
depth = c(0.5,1.0,1.5,0.5,1.0,1.5,0.5,1.0)
temp_C = c(14.27,14.22,14.20,14.23,14.23,14.22,14.23,14.22)
datetime = c("5/11/23 17:01","5/11/23 17:02","5/11/23 17:04",
"5/11/23 18:01","5/11/23 18:22","5/11/23 19:14",
"5/11/23 19:14","5/11/23 19:16")
Carmi_23 = data.frame(PFL_counter, depth ,temp_C, datetime)
Carmi_23$datetime = mdy_hm(Carmi_23$datetime)
Carmi_23
The actual data have 58522 observations, depth increases from 0.5 to 9 MAX (e.g. sometimes will only go to 7) by 0.5 increments, the counter increases by 1 for every depth profile completed (every time depth goes back to 0.5).
There should be 1 depth profile for each hour. My goal is to have each datetime rounded to the hour for its depth profile (you could think of this as its PFL_counter group). Like so:
PFL_counter = c(2,2,2,3,3,3,4,4)
depth = c(0.5,1.0,1.5,0.5,1.0,1.5,0.5,1.0)
temp_C = c(14.27,14.22,14.20,14.23,14.23,14.22,14.23,14.22)
datetime = c("5/11/23 17:00","5/11/23 17:00","5/11/23 17:00",
"5/11/23 18:00","5/11/23 18:00","5/11/23 18:00",
"5/11/23 19:00","5/11/23 19:00")
Carmi_23 = data.frame(PFL_counter, depth ,temp_C, datetime)
Carmi_23$datetime = mdy_hm(Carmi_23$datetime)
Carmi_23
The issue is that for some depth profiles it took over an hour, so the last couple of depths are not within the same hour as when the profile began (at 0.5 depth). Recreated issue in first code chunk, above, in row 6.
My first approach:
Carmi_23$datetime = floor_date(Carmi_23$datetime, "hour") # tried this
Carmi_23$datetime = round_date(Carmi_23$datetime, "10 mins") # also this
Carmi_23$datetime = floor_date(Carmi_23$datetime, "hour")
This becomes an issue when I need to pivot_wider because then I get half a row for one hour and half a row for another (and I can't have NAs for rLakeAnalyzer heat map). In the example data:
testingpivot = Carmi_23 %>%
select(datetime=datetime, depth=depth, temp_C=temp_C) %>%
pivot_wider(id_cols=datetime,
names_from=depth,
values_from=temp_C,
values_fn=function(x) mean(x, na.rm=TRUE)
) %>%
rename_with(~ paste0("wtr_", .), -datetime)
#%>% na.omit() leave until figured out
testingpivot
I also looked into openair
library(openair)
Carmi_23$date = Carmi_23$datetime
Carmi_openair = timeAverage(Carmi_23, avg.time = "hour", type = "PFL_counter") # datetime needs to be called "date"
But this averages all the column values so that the time is collapsed into one row per avg hour...
My next attempts are trying to group_by and mean and summarize (take the average by PFL_counter, and then round down), but the key is that I need the datetime to repeat for all depths within its respective depth profile AND the length of depths per PFL_counter is unpredictable (in example data I replicated this by having 3 rows for two depth profiles and 2 rows for one).
Does anyone have any ideas? I could just NA omit rows after pivoting for which the hour got split in half, but a very stubborn part of me wants to figure it out.
R
solution:Carmi_23$datetime <- ave(
Carmi_23$datetime,
Carmi_23$PFL_counter,
FUN = function (x)
min(lubridate::floor_date(x, "hour"))
)
dplyr
solution:library(dplyr)
Carmi_23 |>
mutate(datetime = min(lubridate::floor_date(datetime, "hour")), .by = PFL_counter)
> Carmi_23
PFL_counter depth temp_C datetime
1 2 0.5 14.27 2023-05-11 17:00:00
2 2 1.0 14.22 2023-05-11 17:00:00
3 2 1.5 14.20 2023-05-11 17:00:00
4 3 0.5 14.23 2023-05-11 18:00:00
5 3 1.0 14.23 2023-05-11 18:00:00
6 3 1.5 14.22 2023-05-11 18:00:00
7 4 0.5 14.23 2023-05-11 19:00:00
8 4 1.0 14.22 2023-05-11 19:00:00
> dput(Carmi_23)
structure(list(PFL_counter = c(2, 2, 2, 3, 3, 3, 4, 4), depth = c(0.5,
1, 1.5, 0.5, 1, 1.5, 0.5, 1), temp_C = c(14.27, 14.22, 14.2,
14.23, 14.23, 14.22, 14.23, 14.22), datetime = structure(c(1683824460,
1683824520, 1683824640, 1683828060, 1683829320, 1683832440, 1683832440,
1683832560), class = c("POSIXct", "POSIXt"), tzone = "UTC")), class = "data.frame", row.names = c(NA,
-8L))