I am new to coding and am doing some gene expression analysis. I have a very naïve question. I have a a few gene expression data frames with gene names as rows and cell names as columnsExample gene exp. data frame. I want to log2 transform the data, but am confused between log and log+1. how do perform log2+1 (log(x + 1)) transformation of a dataframe in R? is it same as log2 transformation? Should I do t=log(v+1)
?
Any help will be appreciated.
Park's answer gives the simplest way to log transform a numeric only data.frame but log(x+1, base = b)
is a different problem.
log(x + 1)
But if the transformation is y <- log(x + 1)
(could be base 2), then beware of floating-point issues. For very small values of abs(x)
the results of log(x + 1, base = b)
are unreliable.
x <- seq(.Machine$double.eps, .Machine$double.eps^0.5, length.out = 10)
eq <- log(x + 1) == log1p(x)
eq
#[1] TRUE FALSE FALSE TRUE FALSE FALSE TRUE FALSE FALSE TRUE
which(eq)
#[1] 1 4 7 10
This is why base R has a function log1p
. To compute log(x + 1, base = 2)
or, equivalently, log2(x + 1)
, use
log2p1 <- function(x) log1p(x)/log(2)
eq2 <- log2(x + 1) == log2p1(x)
eq2
# [1] FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE TRUE
which(eq2)
#[1] 7 10
In both case the difference between log(x + 1)
and the numerically more accurate version is smaller in absolute value than .Machine$double.eps
.
abs(log(x + 1) - log1p(x)) < .Machine$double.eps
# [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
abs(log2(x + 1) - log2p1(x)) < .Machine$double.eps
# [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE