Search code examples
rdataframepcanumericrna-seq

deal with "0" and "-inf" in a dataframe from pca() with M3C in R


I have a dataframe consisting of sample_id in colnames, genenames in rownames and a matrix of values (rnaseq tpm). I want to perform pca() from the M3C package. I first log2 transformed my matrix using:

df_log2 <- mutate_if(df, is.numeric, log2)

Then brought back rownames using:

rownames(df_log2) <- rownames(df)

However, when I tried the PCA I obtained the following error:

> PCA <- pca(df)
***PCA wrapper function*** running... Error in colMeans(x, na.rm = TRUE) : 'x' must be numeric

I checked back df and some columns contain "-inf" as a value as a result of log2(0) from the first df.

Is the "-inf" the problem? and if yes, how can I deal with it?


Solution

  • You could try a transformation like log(x + 1), or a square root or cube root transformation like sqrt(x) and x^(1/3), respectively:

    # log(x + 1)
    df_log2 <- mutate_if(df, is.numeric, .funs = function(x){log(x + 1)})
    
    # sqrt(x)
    df_log2 <- mutate_if(df, is.numeric, .funs = function(x){sqrt(x)})
    
    # x^(1/3)
    df_log2 <- mutate_if(df, is.numeric, .funs = function(x){x^(1/3)})