Search code examples
rfilteringdata-manipulationdummy-variable

How to generate a dummy treatment variable based on values from two different variables


I would like to generate a dummy treatment variable "treatment" based on country variable "iso" and earthquakes dummy variable "quake" (for dataset "data").

I would basically like to get a dummy variable "treatment" where, if quake==1 for at least one time in my entire timeframe (let's say 2000-2018), I would like all values for that "iso" have "treatment"==1, for all other countries "iso"==0. So countries that are affected by earthquakes have all observations 1, others 0.

I have tried using dplyr but since I'm still very green at R, it has taken me multiple tries and I haven't found a solution yet. I've looked on this website and google.

I suspect the solution should be something along the lines of but I can't finish it myself:

data %>%
filter(quake==1) %>%
group_by(iso) %>%
mutate(treatment)

Solution

  • Welcome to StackOverflow ! You should really consider Sotos's links for your next questions on SO :) Here is a dplyr solution (following what you started) :

    ## data
    set.seed(123)
    data <- data.frame(year = rep(2000:2002, each = 26), 
                       iso = rep(LETTERS, times = 3),
                       quake = sample(0:1, 26*3, replace = T))
    ## solution (dplyr option)
    library(dplyr)
    data2 <- data %>% arrange(iso) %>%
            group_by(iso) %>%
            mutate(treatment = if_else(sum(quake) == 0, 0, 1))
    data2 
    # A tibble: 78 x 4
    # Groups:   iso [26]
        year iso   quake treatment
       <int> <fct> <int>     <dbl>
     1  2000 A         0         1
     2  2001 A         1         1
     3  2002 A         1         1
     4  2000 B         1         1
     5  2001 B         1         1
     6  2002 B         0         1
     7  2000 C         0         1
     8  2001 C         0         1
     9  2002 C         1         1
    10  2000 D         1         1
    # ... with 68 more rows