Search code examples
rdplyrgroup-bylaglead

Calculate lag string in group


I have a toy electoral db and need to calculate incumbency but cannot using grouped values and dplyr::lag

race <- data.frame(city=rep(1,6),
                   date=c(3,3,2,2,1,1),
                   candidate=c("A","B","A","C","D","E"),
                   winner=rep(c(1,0),3))

I made a convoluted attempt that is not ideal (as I have to merge in non-winners:

race %>%
  group_by(city,date) %>% 
  mutate(win_candidate=candidate[winner==1]) %>% 
  filter(winner==1) %>% 
  ungroup() %>%
  group_by(city) %>% 
  mutate(incumbent=lead(win_candidate, n=1, default = NA_character_),
         incumbent=ifelse(candidate==incumbent,1,0)) %>%
  select(-win_candidate)


Solution

  • How about this:

    r <- race %>%
      group_by(city,date) %>% 
      summarise(win_candidate = candidate[which(winner== 1)]) %>% 
      ungroup %>% 
      group_by(city) %>% 
      arrange(date) %>% 
      mutate(prev_win_candidate = lag(win_candidate)) %>% 
      left_join(race, .) %>%
      mutate(incumbent = as.numeric(candidate == prev_win_candidate), 
             incumbent = case_when(
               is.na(incumbent) ~ 0, 
               TRUE ~ incumbent)) %>% 
      select(-c(win_candidate, prev_win_candidate))
      
    #   city date candidate winner incumbent
    # 1    1    3         A      1         1
    # 2    1    3         B      0         0
    # 3    1    2         A      1         0
    # 4    1    2         C      0         0
    # 5    1    1         D      1         0
    # 6    1    1         E      0         0