I have a toy electoral db and need to calculate incumbency but cannot using grouped values and dplyr::lag
race <- data.frame(city=rep(1,6),
date=c(3,3,2,2,1,1),
candidate=c("A","B","A","C","D","E"),
winner=rep(c(1,0),3))
I made a convoluted attempt that is not ideal (as I have to merge in non-winners:
race %>%
group_by(city,date) %>%
mutate(win_candidate=candidate[winner==1]) %>%
filter(winner==1) %>%
ungroup() %>%
group_by(city) %>%
mutate(incumbent=lead(win_candidate, n=1, default = NA_character_),
incumbent=ifelse(candidate==incumbent,1,0)) %>%
select(-win_candidate)
How about this:
r <- race %>%
group_by(city,date) %>%
summarise(win_candidate = candidate[which(winner== 1)]) %>%
ungroup %>%
group_by(city) %>%
arrange(date) %>%
mutate(prev_win_candidate = lag(win_candidate)) %>%
left_join(race, .) %>%
mutate(incumbent = as.numeric(candidate == prev_win_candidate),
incumbent = case_when(
is.na(incumbent) ~ 0,
TRUE ~ incumbent)) %>%
select(-c(win_candidate, prev_win_candidate))
# city date candidate winner incumbent
# 1 1 3 A 1 1
# 2 1 3 B 0 0
# 3 1 2 A 1 0
# 4 1 2 C 0 0
# 5 1 1 D 1 0
# 6 1 1 E 0 0