Search code examples
rdataframeif-statementcharacter-encodingdummy-variable

Create dummy based on character vectors in r


I want to create a dummy variable if all entries (in cols value_1_value_3) are equal to a given character (e.g. "C"), or are NAs.

Toy example:

df <- data.frame(state=rep("state"),
               candidate=c("a","b","c"),
               value_1= c("A","B","C"),
               value_2= c("A","B",NA),
               value_3= c("C",NA,NA), stringsAsFactors = FALSE)

Desiderata:

df <- data.frame(state=rep("state"),
             candidate=c("a","b","c"),
             value_1= c("A","B","C"),
             value_2= c("A","B",NA),
             value_3= c("C",NA,NA), 
             dummy=c(0,0,1),stringsAsFactors = FALSE)

I tried (but does not work):

df$dummy <- ifelse(df[-(1:2)] %in% c("C","NA"),1,0)

Solution

  • Another way:

    rowSums(df[-(1:2)] != "C", na.rm=TRUE) == 0
    # [1] FALSE FALSE  TRUE
    

    How it works:

    • Make a matrix of checks for non-"C" values
    • Count non-"C" values by row, skipping NAs
    • If the count is 0, TRUE; else, FALSE

    Confusingly, df[-(1:2)] == "C" yields a matrix, while df[-(1:2)] %in% "C" does not. To handle the latter, wrap as.matrix(df[-(1:2)]) first.