I would like to generate a dummy treatment variable "treatment" based on country variable "iso" and earthquakes dummy variable "quake" (for dataset "data").
I would basically like to get a dummy variable "treatment" where, if quake==1 for at least one time in my entire timeframe (let's say 2000-2018), I would like all values for that "iso" have "treatment"==1, for all other countries "iso"==0. So countries that are affected by earthquakes have all observations 1, others 0.
I have tried using dplyr but since I'm still very green at R, it has taken me multiple tries and I haven't found a solution yet. I've looked on this website and google.
I suspect the solution should be something along the lines of but I can't finish it myself:
data %>%
filter(quake==1) %>%
group_by(iso) %>%
mutate(treatment)
Welcome to StackOverflow ! You should really consider Sotos's links for your next questions on SO :)
Here is a dplyr
solution (following what you started) :
## data
set.seed(123)
data <- data.frame(year = rep(2000:2002, each = 26),
iso = rep(LETTERS, times = 3),
quake = sample(0:1, 26*3, replace = T))
## solution (dplyr option)
library(dplyr)
data2 <- data %>% arrange(iso) %>%
group_by(iso) %>%
mutate(treatment = if_else(sum(quake) == 0, 0, 1))
data2
# A tibble: 78 x 4
# Groups: iso [26]
year iso quake treatment
<int> <fct> <int> <dbl>
1 2000 A 0 1
2 2001 A 1 1
3 2002 A 1 1
4 2000 B 1 1
5 2001 B 1 1
6 2002 B 0 1
7 2000 C 0 1
8 2001 C 0 1
9 2002 C 1 1
10 2000 D 1 1
# ... with 68 more rows