Search code examples
rsurvival-analysisnls

Fitting of nls model in R - manual versus brute force issues


I am trying to fit several models to the same data, in this example it's a double exponential approach. I have tried to eyeball the parameters, as well as use a brute force approach (see below.

In this case it seems that the manual setting of the parameters (black line) is better than the brute force approach (red line). I don't really understand why this is the case, because the parameter space set in the starting values of the brute force method includes the parameters that are set for the manual approach.

Could someone explain why the brute force method isn't converging on the same parameters, or at least have a better fit?

In addition, if I could have advice on the best fit for the data. I've tried lnorm, power, exp, and double exp.

Thanks!

# loading packages
library(nls2)
library(tidyverse)

# example dataset
df = data.frame(
  time = c(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 16, 18, 21, 23, 32, 33),
  proportion = c(
    1.00000000, 0.89583333, 0.77083333, 0.58333333, 0.54166667,
    0.43750000, 0.35416667, 0.31250000, 0.25000000, 0.22916667,
    0.16666667, 0.14583333, 0.12500000, 0.10416667, 0.08333333,
    0.06250000, 0.04166667, 0.02083333, 0.00000000
  )
)


# Fitting model using manual approach
manual_fit <- nls2::nls2(proportion ~ a * exp(b * time) + c * exp(d * time),
                          data = df,
                          start = list(a = 1, b = -0.1, c = 1, d = -0.5))


# setting starting parameter space for brute approach
start_values <- list(a = seq(-.5, 1.5, length.out = 10), 
                     b = seq(-.5, 2, length.out = 10),
                     c = seq(-.5, 2, length.out = 10),
                     d = seq(-1, 2, length.out = 10))

# fitting model using brute approach
brute_fit <- nls2::nls2(proportion ~ a * exp(b * time) + c * exp(d * time), 
                                       data = df, 
                                       start = expand.grid(start_values),
                                       control = nls.control(maxiter = 2000), algorithm = "brute-force") 


# plotting to see both fits
ggplot(df) +
  geom_point(aes(x = time, y = proportion), size = 3.5) +
  geom_line(aes(x = time, y = predict(manual_fit)), linewidth = 1) +
  geom_line(aes(x = time, y = predict(brute_fit)), linewidth = 1, colour = "red") +
  ylab("Proportion > Time") +
  xlab("Time (days)")

Created on 2024-07-01 with reprex v2.1.0


Solution

  • There is no concept of convergence in the brute force algorithm. All it does is evaluate the objective at each point and then take the best. It is normally used to get starting values which are then fed into nls. Note that if given a 2 row data frame it will create the grid for you as shown below.

    library(nls2)
    
    fo <- proportion ~ a * exp(b * time) + c * exp(d * time)
    st <- data.frame(a = c(-0.5, 1.5), b = c(-0.5, 2), c = c(-0.5, 2), d = c(-1, 2))
    fm0 <- nls2(fo, df, start = st, alg = "brute", control = list(maxiter = 10^4))
    
    fm <- nls(fo, df, start = coef(fm0))
    fm
    ## Nonlinear regression model
    ##   model: proportion ~ a * exp(b * time) + c * exp(d * time)
    ##    data: df
    ##        a        b        c        d 
    ##  1.00025 -0.18194  0.03566 -0.02066 
    ##  residual sum-of-squares: 0.008595
    ##
    ## Number of iterations to convergence: 10 
    ## Achieved convergence tolerance: 6.554e-06
    

    Instead you might try the plinear algorithms which can be used to avoid starting values for the parameters that enter linearly. Here we use plinear-random with nls2 and then use plinear with nls.

    set.seed(123)
    fo.p <- proportion ~ cbind(a = exp(b * time), c = exp(d * time))
    fm0.p <- nls2(fo.p, df, start = st[c("b", "d")], alg = "plinear-random",
      control = list(maxiter = 10^2))
    
    fm.p <- nls(fo.p, df, start = coef(fm0.p)[c("b", "d")], alg = "plinear")
    fm.p
    ## Nonlinear regression model
    ##   model: proportion ~ cbind(a = exp(b * time), c = exp(d * time))
    ##    data: df
    ##        b        d   .lin.a   .lin.c 
    ## -0.02066 -0.18194  0.03566  1.00025 
    ##  residual sum-of-squares: 0.008595
    ##
    ## Number of iterations to convergence: 17 
    ## Achieved convergence tolerance: 5.783e-06