Search code examples
rhypothesis-test

Normality test for polynomial regression


In R, I used polynomial regression for the database below. It shows that the R2 is good and both the significance level for the coefficients and the model are less than 0.05. But when using the shapiro.test for testing residuals, the p-value is 0.01088 which means that the residuals are not in line with normal distribution. So I wonder whether the polynomial regression is effective or not. Does the residuals of the polynomial regression have to satisfy the normality hypothesis?

Attached below are the code and the data used for regression.

alloy<-data.frame(
  x=c(37.0, 37.5, 38.0, 38.5, 39.0, 39.5, 40.0,
      40.5, 41.0, 41.5, 42.0, 42.5, 43.0),
  y=c(3.40, 3.00, 3.00, 3.27, 2.10, 1.83, 1.53,
      1.70, 1.80, 1.90, 2.35, 2.54, 2.90))

lm.sol=lm(y~x+I(x^2),data=alloy)
summary(lm.sol)

y.res=lm.sol$residuals
shapiro.test(y.res)

Solution

  • Well ... this question probably belongs to stat.exchange since it has little to do with programming. However, here's my brief take on your data.

    R2 and shapiro.test address different features of the data and model fit, so you can have that one is "good"* and the other is not (for sufficiently vague definitions of "good" and "not").

    If you plot your data and your fit in the same graph then you see that the overall trend is nicely captured by your quadratic regression model.

    plot(y ~ x, data=alloy)
    lines(alloy$x, predict(lm.sol))
    

    enter image description here

    The model does quite nicely. You can also see that the qq-plot of the residuals indicates that there might be a problem with variance homogeneity (see the last residual).

    qqnorm(resid(lm.sol))
    

    enter image description here

    In other words, the residuals may not necessarily follow a Gaussian distribution but the overall trend in the data is captured.

    Did that help?