Search code examples
rconstants

Removal of constant columns in R


I was using the prcomp function when I received this error

Error in prcomp.default(x, ...) : 
cannot rescale a constant/zero column to unit variance

I know I can scan my data manually but is there any function or command in R that can help me remove these constant variables? I know this is a very simple task, but I have never been across any function that does this.

Thanks,


Solution

  • The problem here is that your column variance is equal to zero. You can check which column of a data frame is constant this way, for example :

    df <- data.frame(x=1:5, y=rep(1,5))
    df
    #   x y
    # 1 1 1
    # 2 2 1
    # 3 3 1
    # 4 4 1
    # 5 5 1
    
    # Supply names of columns that have 0 variance
    names(df[, sapply(df, function(v) var(v, na.rm=TRUE)==0)])
    # [1] "y" 
    

    So if you want to exclude these columns, you can use :

    df[,sapply(df, function(v) var(v, na.rm=TRUE)!=0)]
    

    EDIT : In fact it is simpler to use apply instead. Something like this :

    df[,apply(df, 2, var, na.rm=TRUE) != 0]