I'm trying to write a function that will return the index of all binary variables in a data frame with the exception of a predefined variable or list of variable supplied. you can generate example data with this:
data<-data.frame("RESPONSE" = sample(c("YES","NO"),100,replace = T),
"FACTOR" = sample(c("YES","NO","MAYBE"),100,replace = T),
"BINARY" = sample(c("YES","NO"),100,replace = T),
"NUMERIC" = sample(1:100,100,replace = T))
In this case the predefined variable to ignore is "RESPONSE"
response.variable.name<-"RESPONSE"
I can get the list of all the binary variables using:
sapply(data,function(x) nlevels(as.factor(x))==2)
and the list of all variables not named "RESPONSE" using:
!names(data) %in% response.variable.name
but the output I'm looking for ignores the predefined column or list of columns and would return the same output as you would get with:
names(data)=="BINARY"
I thought using the two conditions inside the sapply function, but names(x) inside sapply returns NULL values. I know there's an easy fix for this problem
## Desired result?
names(data)=="BINARY"
# [1] FALSE FALSE TRUE FALSE
## Desired method
response.variable.name<-"RESPONSE"
sapply(data,function(x) nlevels(as.factor(x))==2) & !names(data) %in% response.variable.name
# RESPONSE FACTOR BINARY NUMERIC
# FALSE FALSE TRUE FALSE
## same values, has names too (bonus!)
## wrap in `unname()` if you don't like names