Search code examples
rbooleanconditional-statementsrecode

How to reference all other columns in R?


I am working with data similar to the data below:

ID <- c("A", "B", "C", "D", "E")
x1 <- c(1,1,1,1,0)
x2 <- c(0,0,1,2,2)
x3 <- c(0,0,0,0,0)
x4 <- c(0,0,0,0,0)

df <- data.frame(ID, x1, x2, x3, x4)

It looks like:

> df
  ID x1 x2 x3 x4
1  A  1  0  0  0
2  B  1  0  0  0
3  C  1  1  0  0
4  D  1  2  0  0
5  E  0  2  0  0

I want to create a new column, which is the product of the conditional statement: if x1 == 1 and all the other columns are equal to 0, then it is coded "Positive".

How can I reference all the other columns besides x1 without having to write out the rest of the columns in the conditional statement?


Solution

  • Base R:

    df$new <- ifelse(df$x1==1 &                  ## check x1 condition
                     rowSums(df[,3:5]!=0)==0),   ## add the logical outcomes by row
                     "Positive",
                     "not_Positive"))
    

    The second line is a little tricky.

    • df[,3:5] (or df[,-(1:2)]) selects all the columns except the first two. You could also use subset(df,select=x2:x4) here (although ?subset says "Warning: This is a convenience function intended for use interactively ...")
    • !=0 tests whether the values are 0 or not, returning TRUE or FALSE
    • rowSums() adds up the values (FALSE→0, TRUE →1)
    • the row sum is zero if all of the logical values in that row are zero when converted to numeric (→ all FALSE → none are not equal to zero)

    If there might be NA values then you'll need an na.rm=TRUE in your rowSums() specification