I have a dataset that I need to transfer into normal distribution.
First, Generate a reproducible dataset.
df <- runif(500, 0, 100)
Second, define a function. This function will continue transforming d.f. until P > 0.05. The transformed d.f. will be generated and named as y.
BoxCoxTrans <- function(y)
{
lambda <- 1
constant <- 0
while(shapiro.test(y)$p.value < 0.10)
{
constant <- abs(min(y, na.rm = TRUE)) + 0.001
y <- y + constant
lambda <- powerTransform(y)$lambda
y <- y ^ lambda
}
assign("y", y, envir = .GlobalEnv)
}
Third, test df
shapiro.test(df)
Shapiro-Wilk normality test
data: df
W = 0.95997, p-value = 2.05e-10
Because P < 0.05, transform df
BoxCoxTrans(df)
Then it gives me the following error messages,
Error in qr.resid(xqr, w * fam(Y, lambda, j = TRUE)) :
NA/NaN/Inf in foreign function call (arg 5)
What did I do wrong?
You could use a Box-Muller Transformation to generate an approximately normal distribution from a random uniform distribution. This might be more appropriate than a Box-Cox Transformation, which AFAIK is typically applied to convert a skewed distribution into one that is almost normal.
Here's an example of a Box-Muller Transformation applied to a set of uniformly distributed numbers:
set.seed(1234)
size <- 5000
a <- runif(size)
b <- runif(size)
y <- sqrt(-2 * log(a)) * cos(2 * pi * b)
plot(density(y), main = "Example of Box-Muller Transformation", xlab="x", ylab="f(x)")
library(nortest)
#> lillie.test(y)
#
# Lilliefors (Kolmogorov-Smirnov) normality test
#
#data: y
#D = 0.009062, p-value = 0.4099
#
#> shapiro.test(y)
#
# Shapiro-Wilk normality test
#
#data: y
#W = 0.99943, p-value = 0.1301
#
Hope this helps.