I have a data frame which contain categorical variables - factors with 2 levels or 1 level. I am trying to remove all columns which have only level. Since I have more than 300 categorical variables, I would like to use looping or function.
Here is a sample code with only 5 columns to make it simple here:
B1 <- as.factor(c(1,1,1,1,1,1))
B2 <- as.factor(c(1,0,1,1,0,0))
B3 <- as.factor(c(0,1,1,0,1,0))
B4 <- as.factor(c(0,0,0,0,0,0))
B5 <- as.factor(c(1,0,1,0,1,0))
df <- data.frame(B1,B2,B3,B4,B5)
In this case I would like to drop column B1 and B4 because they have only one level and have a data frame like this
B2 B3 B5
1 1 0 1
2 0 1 0
3 1 1 1
4 1 0 0
5 0 1 1
6 0 0 0
I tried to use several codes but I dont get the desired result.
df1 <- data.frame(df1[,xqual[,c(1:5)] == "1" & df[ ,c(1:5)] == "0"])
or
for (i in 2:dim(df)[2]){
df1[,i] = which(df[,i] == "1" & df[,i] == "0")
}
Another base R option:
df[sapply(df, nlevels) > 1]
B2 B3 B5
1 1 0 1
2 0 1 0
3 1 1 1
4 1 0 0
5 0 1 1
6 0 0 0
Or using Filter
:
Filter(\(x) nlevels(x) - 1, df)