Say I have the following data:
date | employee | company | rolename | company_conglomerate |
---|---|---|---|---|
31-12-2008 | X | A | manager | 1 |
31-12-2008 | Y | B | manager | 0 |
31-12-2008 | Z | C | manager | 0 |
31-12-2009 | Y | A | CEO | 1 |
31-12-2009 | X | B | CEO | 0 |
31-12-2009 | Z | C | CEO | 0 |
Based on the above data, I want to create a new dummy variable that is equal to 1 if a CEO has prior experience working at a conglomerate company, and 0 otherwise.
date | employee | company | rolename | company_conglomerate | CEO_prior_conglomerate_experience |
---|---|---|---|---|---|
31-12-2008 | X | A | manager | 1 | 0 |
31-12-2008 | Y | B | manager | 0 | 0 |
31-12-2008 | Z | C | manager | 0 | 0 |
31-12-2009 | Y | A | CEO | 1 | 0 |
31-12-2009 | X | B | CEO | 0 | 1 |
31-12-2009 | Z | C | CEO | 0 | 0 |
Notice that employee X switches from company A to company B, and employee Y switches from company B to company A. Both were promoted, however, the new variable is only equal to 1 for employee X since they had prior experience at a conglomerate firm.
I can't seem to think of a solution or find my way around this problem. Any help here would be appreciated.
We can use lag(cumany(.))
after arranging by date
to determine if an employee
previously met the condition.
library(dplyr)
quux %>%
mutate(date = as.Date(date, format = "%d-%m-%Y")) %>%
arrange(date) %>%
group_by(employee) %>%
mutate(prior = (rolename == "CEO") & lag(cumany(company_conglomerate > 0))) %>%
ungroup()
# # A tibble: 6 × 6
# date employee company rolename company_conglomerate prior
# <date> <chr> <chr> <chr> <int> <lgl>
# 1 2008-12-31 X A manager 1 FALSE
# 2 2008-12-31 Y B manager 0 FALSE
# 3 2008-12-31 Z C manager 0 FALSE
# 4 2009-12-31 Y A CEO 1 FALSE
# 5 2009-12-31 X B CEO 0 TRUE
# 6 2009-12-31 Z C CEO 0 FALSE
Note that since we're ordering by date
, we need to format it for better sorting.
Data
quux <- structure(list(date = c("31-12-2008", "31-12-2008", "31-12-2008", "31-12-2009", "31-12-2009", "31-12-2009"), employee = c("X", "Y", "Z", "Y", "X", "Z"), company = c("A", "B", "C", "A", "B", "C"), rolename = c("manager", "manager", "manager", "CEO", "CEO", "CEO"), company_conglomerate = c(1L, 0L, 0L, 1L, 0L, 0L)), class = "data.frame", row.names = c(NA, -6L))