Search code examples
rloopsdataframedummy-variablepanel-data

How to generate dummy variables based on multiple if criteria


I need to generate a few dummy variables in R and would like your input on this.

In the dataset, there are 10 observations per participant and each participant is allocated to one of the four treatments (1,2,3,4). The choice is to select either "1" or "2" in 10 tasks (taskno). Below are the observations.



+----+--------+---------+--------+
| id | taskno | treatno | choice |
+----+--------+---------+--------+
|  1 |      1 |       1 |      1 |
|  1 |      2 |       1 |      2 |
|  1 |      3 |       1 |      2 |
|  1 |      4 |       1 |      2 |
|  1 |      5 |       1 |      1 |
|  1 |      6 |       1 |      1 |
|  1 |      7 |       1 |      1 |
|  1 |      8 |       1 |      1 |
|  1 |      9 |       1 |      1 |
|  1 |     10 |       1 |      1 |
|  2 |      1 |       1 |      1 |
|  2 |      2 |       1 |      1 |
|  2 |      3 |       1 |      2 |
|  2 |      4 |       1 |      2 |
|  2 |      5 |       1 |      1 |
|  . |      . |       . |      . |
|  . |      . |       . |      . |
+----+--------+---------+--------+





Now, I would like to generate a dummy variable, let's call it dummy_1, such that once the participant selects the choice 2 and the treatno is 1, then the dummy_1 should be equal to 1 for all the remaining observations (taskno) for the same participant.

For instance, in the above example, participant 1 selected choice 2 in the second task. Now for rest of the observations (taskno: 3 to 10) for participant 1, the dummy_1 should be equal to 1 (irrespective of participant 1 choices in taskno 3 to 10). The same would apply to participant 2 and so on.

The output for the "dummy_1" should be:




+----+--------+---------+--------+---------+
| id | taskno | treatno | choice | dummy_1 |
+----+--------+---------+--------+---------+
|  1 |      1 |       1 |      1 |       0 |
|  1 |      2 |       1 |      2 |       0 |
|  1 |      3 |       1 |      2 |       1 |
|  1 |      4 |       1 |      2 |       1 |
|  1 |      5 |       1 |      1 |       1 |
|  1 |      6 |       1 |      1 |       1 |
|  1 |      7 |       1 |      1 |       1 |
|  1 |      8 |       1 |      1 |       1 |
|  1 |      9 |       1 |      1 |       1 |
|  1 |     10 |       1 |      1 |       1 |
|  2 |      1 |       1 |      1 |       0 |
|  2 |      2 |       1 |      1 |       0 |
|  2 |      3 |       1 |      2 |       0 |
|  2 |      4 |       1 |      2 |       1 |
|  2 |      5 |       1 |      1 |       1 |
|  . |      . |       . |      . |       . |
|  . |      . |       . |      . |       . |
+----+--------+---------+--------+---------+





Any help in this regard would be appreciated. Thanks.


Solution

  • Using dplyr:

    library(dplyr)
    your_data %>% group_by(id) %>%
      arrange(taskno) %>%
      mutate(dummy_1 = lag(as.integer(cumsum(choice == 2 & treatno == 1) > 0), default = 0))