I'm using the nearZeroVar
function in R
to identify zero-variance variables in my analysis dataset.
An example of my code:
train1_nzv_y0 <- nearZeroVar(train1[train1$y=="no",], saveMetrics= TRUE)
train1_nzv_y0[train1_nzv_y0$zeroVar=="TRUE",]
freqRatio percentUnique zeroVar nzv
x1 0 0.0001649599 TRUE TRUE
x2 0 0.0001649599 TRUE TRUE
x3 0 0.0001649599 TRUE TRUE
x4 0 0.0001649599 TRUE TRUE
After producing the data frame train1_nzv_y0
, I'd like to pull a list of the variable names x1-x4 which have been identified as zero-variance variables.
However, there is no column header for the variable names - they're "floating" in front of my first column (freqRatio) but there's no way to identify them.
Is there a standard trick in R
for referring to ghost columns with missing headers?
They are the row names, so you can use rownames()
to get those variable names:
rownames(train1_nzv_y0[train1_nzv_y0$zeroVar=="TRUE",])
This will return a character vector containing "x1", "x2", "x3", "x4".
#install.packages("caret")
library(caret)
# Create example dataset
set.seed(123)
train1 <- data.frame(
y = sample(c("yes", "no"), 100, replace=TRUE),
x1 = c(rep(0, 50), rep(NA, 50)), # zero variance for "no" class
x2 = c(rep(0, 50), rep(1, 50)), # normal variance
x3 = c(rep(0, 50), rep(NA, 50)), # zero variance for "no" class
x4 = 1:100 # normal variance
)
train1_nzv_y0 <- nearZeroVar(train1[train1$y=="no",], saveMetrics=TRUE)
train1_nzv_y0[train1_nzv_y0$zeroVar=="TRUE",]
rownames(train1_nzv_y0[train1_nzv_y0$zeroVar=="TRUE",])
Output:
> train1_nzv_y0[train1_nzv_y0$zeroVar=="TRUE",]
freqRatio percentUnique zeroVar nzv
y 0 2.325581 TRUE TRUE
x1 0 2.325581 TRUE TRUE
x3 0 2.325581 TRUE TRUE
>
> rownames(train1_nzv_y0[train1_nzv_y0$zeroVar=="TRUE",])
[1] "y" "x1" "x3"