Search code examples
rdataframesubsetcategorical-data

Remove factor columns in data frame which contain specific value


I have a data frame which contain categorical variables - factors with 2 levels or 1 level. I am trying to remove all columns which have only level. Since I have more than 300 categorical variables, I would like to use looping or function.

Here is a sample code with only 5 columns to make it simple here:

B1 <- as.factor(c(1,1,1,1,1,1))
B2 <- as.factor(c(1,0,1,1,0,0))
B3 <- as.factor(c(0,1,1,0,1,0))
B4 <- as.factor(c(0,0,0,0,0,0))
B5 <- as.factor(c(1,0,1,0,1,0))
df <- data.frame(B1,B2,B3,B4,B5)

In this case I would like to drop column B1 and B4 because they have only one level and have a data frame like this

   B2 B3 B5
1  1  0  1
2  0  1  0
3  1  1  1
4  1  0  0
5  0  1  1
6  0  0  0

I tried to use several codes but I dont get the desired result.

df1 <- data.frame(df1[,xqual[,c(1:5)] == "1" & df[ ,c(1:5)] == "0"])

or

for (i in 2:dim(df)[2]){
  df1[,i] = which(df[,i] == "1" & df[,i] == "0") 
}

Solution

  • Another base R option:

    df[sapply(df, nlevels) > 1]
      B2 B3 B5
    1  1  0  1
    2  0  1  0
    3  1  1  1
    4  1  0  0
    5  0  1  1
    6  0  0  0
    

    Or using Filter:

    Filter(\(x) nlevels(x) - 1, df)