Search code examples
rdataframenormalization

How to log(x + 1) a data frame or matrix in R


I am new to coding and am doing some gene expression analysis. I have a very naïve question. I have a a few gene expression data frames with gene names as rows and cell names as columnsExample gene exp. data frame. I want to log2 transform the data, but am confused between log and log+1. how do perform log2+1 (log(x + 1)) transformation of a dataframe in R? is it same as log2 transformation? Should I do t=log(v+1) ? Any help will be appreciated.


Solution

  • Park's answer gives the simplest way to log transform a numeric only data.frame but log(x+1, base = b) is a different problem.

    log(x + 1)

    But if the transformation is y <- log(x + 1) (could be base 2), then beware of floating-point issues. For very small values of abs(x) the results of log(x + 1, base = b) are unreliable.

    x <- seq(.Machine$double.eps, .Machine$double.eps^0.5, length.out = 10)
    eq <- log(x + 1) == log1p(x)
    eq
    #[1]  TRUE FALSE FALSE  TRUE FALSE FALSE  TRUE FALSE FALSE  TRUE
    which(eq)
    #[1]  1  4  7 10
    

    This is why base R has a function log1p. To compute log(x + 1, base = 2) or, equivalently, log2(x + 1), use

    log2p1 <- function(x) log1p(x)/log(2)
    
    eq2 <- log2(x + 1) == log2p1(x)
    
    eq2
    # [1] FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE  TRUE
    which(eq2)
    #[1]  7 10
    

    In both case the difference between log(x + 1) and the numerically more accurate version is smaller in absolute value than .Machine$double.eps.

    abs(log(x + 1) - log1p(x)) < .Machine$double.eps
    # [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
    abs(log2(x + 1) - log2p1(x)) < .Machine$double.eps
    # [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE