Search code examples
rscaletransformationheatmap

Understanding `scale` in R


I'm trying to understand the definition of scale that R provides. I have data (mydata) that I want to make a heat map with, and there is a VERY strong positive skew. I've created a heatmap with a dendrogram for both scale(mydata) and log(my data), and the dendrograms are different for both. Why? What does it mean to scale my data, versus log transform my data? And which would be more appropriate if I want to look at the dendrogram illustrating the relationship between the columns of my data?

Thank you for any help! I've read the definitions but they are whooping over my head.


Solution

  • log simply takes the logarithm (base e, by default) of each element of the vector.
    scale, with default settings, will calculate the mean and standard deviation of the entire vector, then "scale" each element by those values by subtracting the mean and dividing by the sd. (If you use scale(x, scale=FALSE), it will only subtract the mean but not divide by the std deviation.)

    Note that this will give you the same values

       set.seed(1)
       x <- runif(7)
    
       # Manually scaling
       (x - mean(x)) / sd(x)
    
       scale(x)