I have a dataframe consisting of sample_id in colnames, genenames in rownames and a matrix of values (rnaseq tpm). I want to perform pca() from the M3C package. I first log2 transformed my matrix using:
df_log2 <- mutate_if(df, is.numeric, log2)
Then brought back rownames using:
rownames(df_log2) <- rownames(df)
However, when I tried the PCA I obtained the following error:
> PCA <- pca(df)
***PCA wrapper function*** running... Error in colMeans(x, na.rm = TRUE) : 'x' must be numeric
I checked back df and some columns contain "-inf" as a value as a result of log2(0) from the first df.
Is the "-inf" the problem? and if yes, how can I deal with it?
You could try a transformation like log(x + 1), or a square root or cube root transformation like sqrt(x) and x^(1/3), respectively:
# log(x + 1)
df_log2 <- mutate_if(df, is.numeric, .funs = function(x){log(x + 1)})
# sqrt(x)
df_log2 <- mutate_if(df, is.numeric, .funs = function(x){sqrt(x)})
# x^(1/3)
df_log2 <- mutate_if(df, is.numeric, .funs = function(x){x^(1/3)})