Search code examples
rcsvlambdatransformationr-car

Estimating Lambda for Yeo and Johnson transform


I have a time series of rainfall values in a csv file.I plotted the histogram of the data. The histogram is skewed to the left. I wanted to transform the values so that it will have a normal distribution. I used the Yeo-Johnson transform available in R. The transformed values are here.

My question is:

In the above transformation, I used a test value of 0.5 for lambda, which works fine. Is there away to determine the optimal value of lambda based on the time series? I'll appreciate any suggestions.

So far, here's the code:

library(car)
dat <- scan("Zamboanga.csv")
hist(dat)
trans <- yjPower(dat,0.5,jacobian.adjusted=TRUE)
hist(trans)

Here is the csv file.


Solution

  • First find the optimal lambda by using the function boxCox from the car package to estimate λ by maximum likelihood.

    You can plot it like this:

    boxCox(your_model, family="yjPower", plotit = TRUE)
    

    example from CV

    As Ben Bolker said in a comment, the model here could be something like

    your_model <- lm(dat~1)
    

    Then use the optimized lambda in your existing code.