Search code examples
rvariablesdplyrtidyversedata-cleaning

From a given row, how to select the previous 'n' rows in R?


I have a dummy variable like so:

df <- data.frame(year = seq(1990, 1997, 1),
                 x = c(1, 0, 0, 0, 1, 1, 0, 0))

year  x
1990  1
1991  0
1992  0
1993  0
1994  1
1995  1
1996  0
1997  0

I want to create a dummy y equalling 1 if the value of x in any of the three previous years is non-zero. The expected result:

year  x   y
1990  1  NA
1991  0  NA
1992  0   1
1993  0   0
1994  1   1
1995  1   1
1996  0   1
1997  0   1

How do I do this? A dplyr solution is preferred.


Solution

  • If you know for sure you want 3 values, you can do:

    library(dplyr)
    
    df %>% mutate(y = sign((x > 0) + (lag(x) > 0) + (lag(x, 2) > 0)))
    #>   year x  y
    #> 1 1990 1 NA
    #> 2 1991 0 NA
    #> 3 1992 0  1
    #> 4 1993 0  0
    #> 5 1994 1  1
    #> 6 1995 1  1
    #> 7 1996 0  1
    #> 8 1997 0  1
    

    But a more general solution if you want to choose n would be:

    n <- 3
    
    df %>% mutate(y = sign(purrr::reduce(seq(n) - 1, ~ .x + (lag(x, .y)))))
    #>   year x  y
    #> 1 1990 1 NA
    #> 2 1991 0 NA
    #> 3 1992 0  1
    #> 4 1993 0  0
    #> 5 1994 1  1
    #> 6 1995 1  1
    #> 7 1996 0  1
    #> 8 1997 0  1