Search code examples
rtransformgaussian

Convert raw data to Gaussian (normal) using qqplot leads to different normal values for same raw data value


I want to convert raw data to Gaussian (mean=0, std =1) using qqnorm function. What I realize though, is that for the same raw values, I get different Gaussian value. Eg:

mydata = c(2.4,       3.7,       2.1,       3,         1.6,       2.5,       2.9, 2.9 )
myquant = qqnorm(mydata)
myquant
          -0.4727891  1.4342002 -0.8524950  0.8524950 -1.4342002 -0.1525060  0.1525060  0.4727891

Moreover, I have used the following code to transform data into normal one:

for (i in 1:ncol(sampledataSubGaus) ) {
  
  
  sampledataSubGaus[,i] <- qqnorm( as.matrix(sampledataSub[,i]) )$x
  
}

where I face the same issue again. Is there an explanation for that? For your information, I have used another function called score.transform, which behaves properly.


Solution

  • I am not quite sure what you mean by "convert" your data to N(0,1) using qqnorm. The qqnorm() function returns x, which are the normal quantiles associated with the corresponding quantiles from your data. The guts of qqnorm() are doing the following:

    mydata = c(2.4,3.7,2.1,3,1.6,2.5,2.9, 2.9 )
    y <- mydata  
    n <- length(y)
    x <- qnorm(ppoints(n))[order(order(y))]
    plot(x,y)
    

    enter image description here

    If you took a subset of these values, you would get different values of x, because it would be using a different number of points to generate the normal quantiles (i.e., the values of ppoints(n) would be different).

    I could be wrong, but I have never heard of someone using qqnorm() to transform data - it is a diagnostic for normality, but not a remedy. Something like a Box-Cox transformation could, under the right circumstances, help transform a skewed variable into something that had a more normal distribution.