Search code examples
rdummy-variable

Define a dummy variable based on binary code in R


Take the following patient data example from a hospital.

YEAR <- sample(1980:1995,15, replace=T)
Pat_ID <- sample(1:100,15)
sex <- c(1,0,1,0,1,0,0,1,0,0,0,0,1,0,0)

df1 <- data.frame(Pat_ID,YEAR,sex)

I want to introduce a dummy variable $PAIR_IDENTIFIER that takes a new value each time a new sex==1 appears. The problem is there is no constant patern to the sex variable.

You see sometimes the succeeding 1 appears in the ith+2 position and then ith+3 position etc.

so $PAIR_IDENTIFIER <- c(1,1,2,2,3,3,3,4,4,4,4,4 .....)


Solution

  • You can do this by simply using the cumsum,

    df1$PAIR_IDENTIFIER <- cumsum(df1$sex)
    df1
    #   Pat_ID YEAR sex PAIR_IDENTIFIER
    #1      54 1991   1               1
    #2     100 1992   0               1
    #3       6 1995   1               2
    #4      99 1994   0               2
    #5      42 1988   1               3
    #6      65 1990   0               3
    #7      53 1994   0               3
    #8      96 1987   1               4