Search code examples
rdplyr

Mutate variable based on previous observations


I am working with a dataset of U.S. House elections and want to create a variable for incumbency. In particular, I want the variable to be equal to 1 for a party that won the last election and is running again. While I have used case_when() to specify multiple contemporaneous conditions before, I cannot seem to successfully use the function (or any others I know) to assign a 1 to the new incumbency variable based on past election outcomes. This appears to be a very simple task but is frustrating me. Any help or insight would be appreciated.

For example, here is code for a simulated dataset that is representative (except that the winners are by plurality rule for ease).

library(tidyverse)
data<-data.frame(
  party=rep(1:2,100),
  district=rep(1:2,each=2,50),
  state=rep(1:2,each=4,25),
  year=rep(1:25,each=8)
) %>%
  mutate(voteshare=rnorm(200,mean=50,sd=10)) %>%
  group_by(year,state,district) %>%
  mutate(rank=dense_rank(desc(voteshare)))

After ranking the contemporaneous outcomes, I would like to mutate a new variable (i for short) where it is equal to 1 if rank = 1 in the past election (year - 1). The closest I have gotten is the following, where it simply skips the first election but then bases i off of contemporaneous rank, not from the year before as I would like.

data<-data %>%
  group_by(state,district) %>%
  mutate(i=case_when(
    rank==1 & year-1 ~ 1
    ))

I have looked at other similar stackoverflow questions but the answers were not clearly applicable to me. I apologize if this question is a duplicate due to my lack of understanding.

To be clear, I am expecting to have a new column (i) with ones and zeros or NA's otherwise. The ones would be assigned to parties in the current election year that won (rank = 1) in the previous election year, with presumably NA's for the first election year (since there is no past election to tell incumbency from). Thank you in advance for any help!


Solution

  • If I understand your expected results correctly, you may want to try something like this:

    data <- data %>%
      arrange(state, district, party, year) %>%  # order year within party
      group_by(state, district, party) %>%
      mutate(i = as.numeric(lag(rank) == 1))