I am trying to use boxcox to normalise the data I have. But I generate a model which can't predict at the conditions I want. Why is this happening?
I have a dataframe:
a<-data.frame(Output=c(0.065,8.00,2.320,0.128,42.500,35.200,18.200,2.94,1.68,13.90,43.50,3.810,2.600),
Carbon=c(20.0,22.5,10.0,7.0,35.0,20.,35.0,2.0,10.0,25.0,30.0,10.0,8.0),
Cooling=c(0.0,50.0,12.0,0.0,12.70,12.70,5.0,2.0,0.00,0.00,12.70,10.00,14.69),
Drying=c(0.0,70.00,0.00,0.00,0.90,0.90,0.90,55.80,0.00,0.00,0.90,15.00,35.56))
Using the following libraries:
library(MASS)
I ran the following codes:
bc<-boxcox(a$Output~a$Cooling*a$Drying+a$Carbon)
lambda<-bc$x[which.max(bc$y)]
new.model<-lm(((a$Output^lambda-1)/lambda)~a$Drying*a$Cooling+a$Carbon)
There are zeros in the dataset and want to transform them so I get normality. With that I want to build a predictive model and test "Output" for the following condition: Carbon=2, Cooling=10, Drying=20
However, I keep getting NaN's in my output. Have I done the transformation incorrectly or is the model flawed?
I think you should not use $ the way you have used it, since if you use that way, the coefficients are created like a$some_variable
, while predicting the names of variables are however some_variable
not a$some_variable
in your given test record, You can try below approach. Please let me know if it fixes your issue.
bc<-boxcox(Output~ Cooling* Drying + Carbon, data=a)
lambda<-bc$x[which.max(bc$y)]
a$lambda <- lambda
new.model<-lm(((Output^lambda-1)/lambda)~Drying* Cooling+ Carbon, data=a)
predict(new.model, data.frame(Carbon=2, Cooling=10, Drying=10, lambda = lambda))
Output:
1
0.1812739866
A look at what happen when you use $ approach for lms:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -3.141173410 1.342601277 -2.33962 0.0474440 *
a$Drying 0.060882585 0.039681152 1.53429 0.1635024
a$Cooling 0.275926915 0.102135431 2.70158 0.0270079 *
a$Carbon 0.219900733 0.059038120 3.72472 0.0058317 **
a$Drying:a$Cooling -0.004854491 0.001593430 -3.04657 0.0159038 *
However without $, this would look like:
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -3.141173410 1.342601277 -2.33962 0.0474440 *
Drying 0.060882585 0.039681152 1.53429 0.1635024
Cooling 0.275926915 0.102135431 2.70158 0.0270079 *
Carbon 0.219900733 0.059038120 3.72472 0.0058317 **
Drying:Cooling -0.004854491 0.001593430 -3.04657 0.0159038 *