I have a dummy variable that serves as a flag for a number of conditions in my data set. I can't figure out how to write a function that marks the spot in which the flag assumes a "final switch" -- a value that will not change for the rest of the data frame. In the example below, everything after the 7th observation is a "y".
dplyr::tibble(
observation = c(seq(1,10)),
crop = c(runif(3,1,25),
runif(1,50,100),
runif(2,1,10),
runif(4,50,100)),
flag = c(rep("n", 3),
rep("y", 1),
rep("n", 2),
rep("y", 4)))
Which yields:
observation crop flag
<int> <dbl> <chr>
1 1 13.3 n
2 2 4.34 n
3 3 17.1 n
4 4 80.5 y
5 5 9.62 n
6 6 8.39 n
7 7 92.6 y
8 8 74.1 y
9 9 95.3 y
10 10 69.9 y
I've tried creating a second flag that marks every switch and returns the "final" switch/flag variable, but over my whole data frame that will likely be highly inefficient. Any suggestions are welcome and appreciated.
One way to do this may be to create a flag that cumulatively sums occurrences of flag switches.
cumsum_na <- function(x){
x[which(is.na(x))] <- 0
return(cumsum(x))
}
df <- dplyr::tibble(
observation = c(seq(1,10)),
crop = c(runif(3,1,25),
runif(1,50,100),
runif(2,1,10),
runif(4,50,100)),
flag = c(rep("n", 3),
rep("y", 1),
rep("n", 2),
rep("y", 4)))
df %>%
mutate(flag2 = ifelse(flag != lag(flag), 1, 0) %>%
cumsum_na)
# A tibble: 10 x 4
observation crop flag flag2
<int> <dbl> <chr> <dbl>
1 1 12.1 n 0
2 2 11.2 n 0
3 3 4.66 n 0
4 4 61.6 y 1
5 5 6.00 n 2
6 6 9.54 n 2
7 7 67.6 y 3
8 8 86.7 y 3
9 9 91.6 y 3
10 10 84.5 y 3
You can then do whatever you need to using the flag2
column (eg. filter for max value, take first row, which will give you the first occurrence of constant state).