Search code examples
rplotstatisticsregressionnon-linear-regression

half-normal plot for a nonlinear model using the nls() function


I fitted a non-linear model to a dataset.

However, I need to perform the half-normal plot of this model (using the hnp() function from the hnp package).

My model was fitted considering the nls() function from the stats package, see:


nonlinear_func <- function(x, beta_0, beta_1) {
  return (beta_0 * (1 - exp(beta_1 * x)) + 1.30)
}

nonlinear_model <- nls(y ~ nonlinear_func(x, beta_0, beta_1), 
            start = list(beta_0 = 16, beta_1 = -0.2), 
            data.set)

The dataset follow:

data.set <- structure(list(x = c(13.05, 6.05, 13.21, 9.55, 18.14, 9.55, 14.48, 
15.28, 9.87, 15.92, 12.41, 12.41, 12.57, 15.12, 10.66, 16.87, 
12.57, 15.92, 9.71, 15.92, 17.35, 6.37, 11.94, 11.14, 8.91, 13.05, 
17.67, 10.66, 17.19, 7, 10.82, 11.62, 16.71, 18.3, 11.78, 12.89, 
10.82, 9.23, 14.32, 7.64, 5.09, 15.44, 10.35, 8.91, 14.32, 13.21, 
8.91, 15.6, 14.16, 15.28, 12.57), y = c(16.8, 11.3, 16.9, 11.8, 
17.5, 15.8, 20, 19.1, 15.8, 18.8, 18.6, 18.4, 18.4, 18.7, 15.2, 
18.3, 15.7, 17.5, 14.8, 16.7, 19.8, 10.3, 18.6, 14.4, 14.3, 17.8, 
21, 18.8, 18.9, 13.7, 17.6, 17.5, 19.6, 18.8, 15.3, 17, 15.9, 
13.3, 17.3, 13.6, 9.3, 17.7, 14.2, 14.9, 18.4, 18.2, 14.3, 19.7, 
18.6, 18.1, 15.5)), row.names = c(17L, 18L, 19L, 20L, 21L, 22L, 
23L, 24L, 25L, 26L, 27L, 28L, 29L, 30L, 31L, 32L, 33L, 34L, 35L, 
36L, 37L, 38L, 39L, 40L, 41L, 42L, 43L, 44L, 45L, 46L, 47L, 48L, 
49L, 50L, 51L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 
12L, 13L, 14L, 15L, 16L), class = "data.frame")

Ok, when reading the information from the hnp package (https://cran.r-project.org/web/packages/hnp/hnp.pdf) p.10, p.14 and p.15, you can see that there is no implementation associated with the nls() function, however it is possible to build the graph through a few steps, see for example this example using an adjusted model in gamlss() (another function that is not implemented within the hnp package):

## Example no. 2: Implementing gamma model using package gamlss
# load package
library(gamlss)
# model fitting
y <- rGA(30, mu=rep(c(.5, 1.5, 5), each=10), sigma=.5)
tr <- gl(3, 10)
fit2 <- gamlss(y ~ tr, family=GA)
# diagfun
d.fun <- function(obj) resid(obj) # this is the default if no
                                  # diagfun is provided
# simfun
s.fun <- function(n, obj) {
  mu <- obj$mu.fv
  sig <- obj$sigma.fv
  rGA(n, mu=mu, sigma=sig)
}
# fitfun
my.data <- data.frame(y, tr)
f.fun <- function(y.) gamlss(y. ~ tr, family=GA, data=my.data)
# hnp call
hnp(fit2, newclass=TRUE, diagfun=d.fun, simfun=s.fun,
    fitfun=f.fun, data=data.frame(y, tr))

Considering this information, I tried to do the same for my non-linear model, see:

d.fun <- function(obj) resid(obj)

s.fun <- function(n, obj) {
}

f.fun <- function(data) {
  nls(y ~ nonlinear_func(x, beta_0, beta_1), 
      start = list(beta_0 = 16, beta_1 = -0.2), 
      data = data.set)
}
library(hnp)
hnp(nonlinear_model, newclass = TRUE, 
diagfun = d.fun, simfun = s.fun, 
fitfun = f.fun, data = data.set)

However, notice that something wrong is happening, as the simulation envelope is not taken into account when creating the graph, see:

enter image description here

In the car package there is a function called qqPlot that executes the graph below (which is not the half-normal plot), however I needed to use the hnp package exclusively.

library(car)

qqPlot(residuals(nonlinear_model), pch = 19, col = "green", cex = 1.2, 
       xlab = "Norm Quantiles", 
       ylab = "Residuals")

enter image description here


Solution

  • There are a couple of issues in your translation from the gamlss functions to your own problem. One of them will require some thinking about exactly how it should work, but I've put together a potential answer. First, read in the data.

    data.set <- structure(list(x = c(13.05, 6.05, 13.21, 9.55, 18.14, 9.55, 14.48, 
    15.28, 9.87, 15.92, 12.41, 12.41, 12.57, 15.12, 10.66, 16.87, 
    12.57, 15.92, 9.71, 15.92, 17.35, 6.37, 11.94, 11.14, 8.91, 13.05, 
    17.67, 10.66, 17.19, 7, 10.82, 11.62, 16.71, 18.3, 11.78, 12.89, 
    10.82, 9.23, 14.32, 7.64, 5.09, 15.44, 10.35, 8.91, 14.32, 13.21, 
    8.91, 15.6, 14.16, 15.28, 12.57), y = c(16.8, 11.3, 16.9, 11.8, 
    17.5, 15.8, 20, 19.1, 15.8, 18.8, 18.6, 18.4, 18.4, 18.7, 15.2, 
    18.3, 15.7, 17.5, 14.8, 16.7, 19.8, 10.3, 18.6, 14.4, 14.3, 17.8, 
    21, 18.8, 18.9, 13.7, 17.6, 17.5, 19.6, 18.8, 15.3, 17, 15.9, 
    13.3, 17.3, 13.6, 9.3, 17.7, 14.2, 14.9, 18.4, 18.2, 14.3, 19.7, 
    18.6, 18.1, 15.5)), row.names = c(17L, 18L, 19L, 20L, 21L, 22L, 
    23L, 24L, 25L, 26L, 27L, 28L, 29L, 30L, 31L, 32L, 33L, 34L, 35L, 
    36L, 37L, 38L, 39L, 40L, 41L, 42L, 43L, 44L, 45L, 46L, 47L, 48L, 
    49L, 50L, 51L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 
    12L, 13L, 14L, 15L, 16L), class = "data.frame")
    

    Next, define the non-linear function of x and the model.

    nonlinear_func <- function(x, beta_0, beta_1) {
      return (beta_0 * (1 - exp(beta_1 * x)) + 1.30)
    }
    
    nonlinear_model <- nls(y ~ nonlinear_func(x, beta_0, beta_1), 
                           start = list(beta_0 = 16, beta_1 = -0.2), 
                           data.set)
    

    The residual function should probably work the way you have it written.

    d.fun <- function(obj) resid(obj)
    

    The simulation function is meant to simulate values of y. The way it works in the gamlss example is that the simulation function uses the same distributional assumption as the model. What I did below is to add the fitted values to a draw from the presumed distribution of the residuals - $N(0,\sigma)$. This is where you will need to make sure that theoretically this is doing the right thing. It does have an analogue to the gamlss example because in that example, they are treating the model parameters as known values and feeding those to the random variate generator. We are essentially doing the same here, treating the mean and variance of the fitted values as known, but drawing from the presumed normal distribution.

    s.fun <- function(n, obj) {
      fitted(obj) + rnorm(n, 0, summary(obj)$sigma)
    }
    

    The next thing to deal with is the fitting function. This doesn't take the original data as an argument, but the simulated y values that are produced with s.fun(). Note here that I use the argument .y to distinguish it from y (the variable name in the data set). Then, you use .y as the outcome variable in the nls() call, but use all other variables from the original dataset.

    f.fun <- function(.y, ...) {
      nls(.y ~ nonlinear_func(x, beta_0, beta_1), 
          start = list(beta_0 = 16, beta_1 = -0.2), 
          data = data.set)
    }
    

    Once you fix those problems, you get something like what you intend. Again, though, you should make sure you're comfortable with how s.fun() is working.

    library(hnp)
    #> Loading required package: MASS
    hnp(nonlinear_model, newclass = TRUE, 
        diagfun = d.fun, simfun = s.fun, 
        fitfun = f.fun, data = data.set)
    

    Created on 2024-01-15 with reprex v2.0.2