I am working with some discharge data that is recorded at 5-minute intervals, so 12 observations per hour. I want to filter the data to only include so many hours (rows) before and after a change in discharge from positive to negative or negative to positive (there is backflow in this river system and that's why there are negative values). I am trying to piggyback off of the answer provided by AntoniosK in this question (R - Find maximum run of positive / negative values).
So, using something like his example dataframe:
wf = data.frame(discharge = c(100, 125, 128, -25, -30, -27, 104, -23, 100, -50))
I would like to filter the column based on some criteria, such as only retain the observations where there were at least 3 values of the same sign (i.e. positive or negative) followed by a change where the next 3 values after the change (including the value where the signs changed) were all the same sign (i.e. positive or negative). Here, with this small example dataframe, the desired result would be:
[100, 125, 128, -25, -30, -27]
I can do something like the following to create a logical vector to subset the column by to identify where a change is sign occurs based on the previous value, but I haven't been able to determine how to look at so many values before and after the change:
(c(0, diff(sign(wf$discharge))) != 0)
Like I mentioned, I was trying to work with the dplyr
solution given by AntoniosK at a starting point. I'm not sure if this will require using rle
to look at runs, or maybe something with lag
and lead
. It will probably be something simple that I am just overlooking at the moment and drawing a blank on.
Here's what I came up with that involves using a loop for the solution.
# Looping solution
# Simple dataframe
wf = data.frame(discharge = c(100, 125, 128, -25, -30, -27, 104, -23, 100, -50))
# AntoniosK dplyr method to detect sign changes in the values of a column.
wf_change <- wf %>%
mutate(sign = ifelse(discharge > 0,"pos", ifelse(discharge < 0, "neg", "zero")),
sign_lag = lag(sign, default = sign[1]),
change = ifelse(sign != sign_lag, 1 , 0),
series_id = cumsum(change) + 1)
# Fill a dummy column that will be used to filter the dataframe
wf_change$retain <- FALSE
# Loop through the variable/column of interest. If there is a change from the
# previous value, look so many values behind and so many values ahead.
# In this small example, just 3 behind and 2 ahead, although my look_behind
# and look_ahead include the value at the change as well. Could also look at the
# change variable in the look_behind and look_ahead instead of the variable of
# interest (discharge).
for (i in 1:length(wf_change$discharge)){
if(wf_change$change[i] == 1){
look_behind = sign(wf_change$discharge[(i-3):i])
look_ahead = sign(wf_change$discharge[i:(i+2)])
# If variance equals 0 for the 3 values before and 2 values after the
# change, then that means the values in the vector are all the same sign.
if(var(look_behind[1:length(look_behind)-1]) == 0 & var(look_ahead[-1]) == 0){
# Assign TRUE to the dummy variable retain.
wf_change$retain[(i-3):(i+2)] <- TRUE
}
}
}
# Filter the dataframe to include only those desired observations that meet
# your criteria.
wf_final <- wf_change %>% filter(retain)