I have a dataframe with dates/times (time series), site (grouping var) and value. I have identified the start times of different 'surges' - defined as changes in values of >=2 in 15 mins. For each surge time, I am trying for the date/time where the value falls back down to (or below) the start of the surge (i.e., the end of the surge).
I can achieve this by using a recursive loop function ('find.next.smaller' from this question - In a dataframe, find the index of the next smaller value for each element of a column). This works perfectly on a smaller dataframe, but not a large one. I get the error message "Error: C stack usage 15925584 is too close to the limit". Having seen other similar questions (e.g., Error: C stack usage is too close to the limit), I do not think its a problem of an infinite recursive function, but a memory issue. But I do not know how to use shell (or powershell) to do this. I wondered whether there was any other way? Either through adapting my memory or the function below?
Some example code:
###df formatting
library(dplyr)
df <- data.frame("Date_time" =seq(from=as.POSIXct("2022-01-01 00:00") , by= 15*60, to=as.POSIXct("2022-01-01 07:00")),
"Site" = rep(c("Site A", "Site B"), each = 29),
"Value" = c(10,10.1,10.2,10.3,12.5,14.8,12.4,11.3,10.3,10.1,10.2,10.5,10.4,10.3,14.7,10.1,
16.7,16.3,16.4,14.2,10.2,10.1,10.3,10.2,11.7,13.2,13.2,11.1,11.4,
rep(10.3,times=29)))
df <- df %>% group_by(Site) %>% mutate(Lead_Value = lead(Value))
df$Surge_start <- NA
df[which(df$Lead_Value - df$Value >=2),"Surge_start"] <-
paste("Surge",seq(1,length(which(df$Lead_Value - df$Value >=2)),1),sep="")
###Applying the 'find.next.smaller' function
find.next.smaller <- function(ini = 1, vec) {
if(length(vec) == 1) NA
else c(ini + min(which(vec[1] >= vec[-1])),
find.next.smaller(ini + 1, vec[-1]))
} # the recursive function will go element by element through the vector and find out
# the index of the next smaller value.
df$Date_time <- as.character(df$Date_time)
Output <- df %>% group_by(Site) %>% mutate(Surge_end = ifelse(grepl("Surge",Surge_start),Date_time[find.next.smaller(1, Value)],NA))
###This works fine
df2 <- do.call("rbind", replicate(1000, df, simplify = FALSE))
Output2 <- df2 %>% group_by(Site) %>% mutate(Surge_end = ifelse(grepl("Surge",Surge_start),Date_time[find.next.smaller(1, Value)],NA))
####This does not work
I suggest you don't need recursion.
find_nearest_value <- function(surge, time1, val1, times, vals) {
if (!grepl("Surge", surge)) NA else times[times > time1 & vals <= val1][1]
}
Output %>%
group_by(Site) %>%
mutate(end2 = if_else(grepl("Surge", Surge_start), mapply(find_nearest_value, Surge_start, Date_time, Value, list(Date_time), list(Value)), NA)) %>%
print(n=99)
# # A tibble: 58 × 7
# # Groups: Site [2]
# Date_time Site Value Lead_Value Surge_start Surge_end end2
# <chr> <chr> <dbl> <dbl> <chr> <chr> <chr>
# 1 2022-01-01 00:00:00 Site A 10 10.1 NA NA NA
# 2 2022-01-01 00:15:00 Site A 10.1 10.2 NA NA NA
# 3 2022-01-01 00:30:00 Site A 10.2 10.3 NA NA NA
# 4 2022-01-01 00:45:00 Site A 10.3 12.5 Surge1 2022-01-01 02:00:00 2022-01-01 02:00:00
# 5 2022-01-01 01:00:00 Site A 12.5 14.8 Surge2 2022-01-01 01:30:00 2022-01-01 01:30:00
# 6 2022-01-01 01:15:00 Site A 14.8 12.4 NA NA NA
# 7 2022-01-01 01:30:00 Site A 12.4 11.3 NA NA NA
# 8 2022-01-01 01:45:00 Site A 11.3 10.3 NA NA NA
# 9 2022-01-01 02:00:00 Site A 10.3 10.1 NA NA NA
# 10 2022-01-01 02:15:00 Site A 10.1 10.2 NA NA NA
# 11 2022-01-01 02:30:00 Site A 10.2 10.5 NA NA NA
# 12 2022-01-01 02:45:00 Site A 10.5 10.4 NA NA NA
# 13 2022-01-01 03:00:00 Site A 10.4 10.3 NA NA NA
# 14 2022-01-01 03:15:00 Site A 10.3 14.7 Surge3 2022-01-01 03:45:00 2022-01-01 03:45:00
# 15 2022-01-01 03:30:00 Site A 14.7 10.1 NA NA NA
# 16 2022-01-01 03:45:00 Site A 10.1 16.7 Surge4 2022-01-01 05:15:00 2022-01-01 05:15:00
# 17 2022-01-01 04:00:00 Site A 16.7 16.3 NA NA NA
# 18 2022-01-01 04:15:00 Site A 16.3 16.4 NA NA NA
# 19 2022-01-01 04:30:00 Site A 16.4 14.2 NA NA NA
# 20 2022-01-01 04:45:00 Site A 14.2 10.2 NA NA NA
# 21 2022-01-01 05:00:00 Site A 10.2 10.1 NA NA NA
# 22 2022-01-01 05:15:00 Site A 10.1 10.3 NA NA NA
# 23 2022-01-01 05:30:00 Site A 10.3 10.2 NA NA NA
# 24 2022-01-01 05:45:00 Site A 10.2 11.7 NA NA NA
# 25 2022-01-01 06:00:00 Site A 11.7 13.2 NA NA NA
# 26 2022-01-01 06:15:00 Site A 13.2 13.2 NA NA NA
# 27 2022-01-01 06:30:00 Site A 13.2 11.1 NA NA NA
# 28 2022-01-01 06:45:00 Site A 11.1 11.4 NA NA NA
# 29 2022-01-01 07:00:00 Site A 11.4 NA NA NA NA
# 30 2022-01-01 00:00:00 Site B 10.3 10.3 NA NA NA
# 31 2022-01-01 00:15:00 Site B 10.3 10.3 NA NA NA
# 32 2022-01-01 00:30:00 Site B 10.3 10.3 NA NA NA
# 33 2022-01-01 00:45:00 Site B 10.3 10.3 NA NA NA
# 34 2022-01-01 01:00:00 Site B 10.3 10.3 NA NA NA
# 35 2022-01-01 01:15:00 Site B 10.3 10.3 NA NA NA
# 36 2022-01-01 01:30:00 Site B 10.3 10.3 NA NA NA
# 37 2022-01-01 01:45:00 Site B 10.3 10.3 NA NA NA
# 38 2022-01-01 02:00:00 Site B 10.3 10.3 NA NA NA
# 39 2022-01-01 02:15:00 Site B 10.3 10.3 NA NA NA
# 40 2022-01-01 02:30:00 Site B 10.3 10.3 NA NA NA
# 41 2022-01-01 02:45:00 Site B 10.3 10.3 NA NA NA
# 42 2022-01-01 03:00:00 Site B 10.3 10.3 NA NA NA
# 43 2022-01-01 03:15:00 Site B 10.3 10.3 NA NA NA
# 44 2022-01-01 03:30:00 Site B 10.3 10.3 NA NA NA
# 45 2022-01-01 03:45:00 Site B 10.3 10.3 NA NA NA
# 46 2022-01-01 04:00:00 Site B 10.3 10.3 NA NA NA
# 47 2022-01-01 04:15:00 Site B 10.3 10.3 NA NA NA
# 48 2022-01-01 04:30:00 Site B 10.3 10.3 NA NA NA
# 49 2022-01-01 04:45:00 Site B 10.3 10.3 NA NA NA
# 50 2022-01-01 05:00:00 Site B 10.3 10.3 NA NA NA
# 51 2022-01-01 05:15:00 Site B 10.3 10.3 NA NA NA
# 52 2022-01-01 05:30:00 Site B 10.3 10.3 NA NA NA
# 53 2022-01-01 05:45:00 Site B 10.3 10.3 NA NA NA
# 54 2022-01-01 06:00:00 Site B 10.3 10.3 NA NA NA
# 55 2022-01-01 06:15:00 Site B 10.3 10.3 NA NA NA
# 56 2022-01-01 06:30:00 Site B 10.3 10.3 NA NA NA
# 57 2022-01-01 06:45:00 Site B 10.3 10.3 NA NA NA
# 58 2022-01-01 07:00:00 Site B 10.3 NA NA NA NA