Search code examples
rdata-cleaningsapply

return index of all factor variables that don't have a predefined name


I'm trying to write a function that will return the index of all binary variables in a data frame with the exception of a predefined variable or list of variable supplied. you can generate example data with this:

data<-data.frame("RESPONSE" = sample(c("YES","NO"),100,replace = T),
             "FACTOR" = sample(c("YES","NO","MAYBE"),100,replace = T),
             "BINARY" = sample(c("YES","NO"),100,replace = T),
             "NUMERIC" = sample(1:100,100,replace = T))

In this case the predefined variable to ignore is "RESPONSE"

response.variable.name<-"RESPONSE"

I can get the list of all the binary variables using:

sapply(data,function(x) nlevels(as.factor(x))==2)

and the list of all variables not named "RESPONSE" using:

!names(data) %in% response.variable.name

but the output I'm looking for ignores the predefined column or list of columns and would return the same output as you would get with:

names(data)=="BINARY"

I thought using the two conditions inside the sapply function, but names(x) inside sapply returns NULL values. I know there's an easy fix for this problem


Solution

  • ## Desired result?
    names(data)=="BINARY"
    # [1] FALSE FALSE  TRUE FALSE
    
    ## Desired method
    response.variable.name<-"RESPONSE"
    sapply(data,function(x) nlevels(as.factor(x))==2) & !names(data) %in% response.variable.name
    # RESPONSE   FACTOR   BINARY  NUMERIC 
    #    FALSE    FALSE     TRUE    FALSE 
    ## same values, has names too (bonus!)
    ## wrap in `unname()` if you don't like names