I would like to create a wrapper for the Levenberg-Marquardt Nonlinear Least-Squares function nls.lm
(minpack.lm library) similar to nls2
(nls2 library) to give a brute force method for evaluating the fit of a model to observed data.
The idea is to create a range of starting value combinations and either:
or
I wanted to do this without looping and after inspiration from here am trying to use nested dataframes, with one column for the parameter input list, one for the values returned by my function, one for the R^2 values, and one for the best fit models,something like:
df
# start_val fun_out R^2
# 1 {a=2,b=2} {22,24,26...} 0.8
# 2 {a=3,b=5} {35,38,41...} 0.6
This is the code I have so far:
require(dplyr);require(tidyr)
foo <- function(x,a,b) a*x^2+b # function I am fitting
x <- 1:10 # independent variable
y_obs <- foo(x,1.5,2.5) + rnorm(length(x),0,10) # observed data (dependent variable)
start_range <- data.frame(a=c(1,2),b=c(2,3)) # range of allowed starting points for fitting
reps <- 2 # number of starting points to generate
# Create a data frame of starting points
df<-as.data.frame(sapply(start_range, function(x) runif(reps,min=x[[1]],max=x[[2]]))) %>%
mutate(id=seq_len(reps)) %>% # fudge to make nest behave as I want
nest(1:ncol(start_range)) %>%
mutate(data=as.list(data)) %>%
as.data.frame()
df
# id data
# 1 1 1.316356, 2.662923
# 2 2 1.059356, 2.723081
I get stuck now trying to pass the parameters in data into the function foo()
. I've tried using do.call()
, and even with using constant parameters the following error appears:
mutate(df,y=do.call(foo,list(x,1,2)))
# Error: wrong result size (5), expected 2 or 1
Is there a way to create columns of a dataframe which contain lists directly without using nest()
?
Also when trying to create the list to pass to do.call()
using the dataframe columns, how do you create a list where the first element is the vector x, the second is the parameter a and the third is the parameter b? The follwing splits the list down the column:
mutate(df,my_list=list(x,data))
# id data my_list
# 1 1 1.316356, 2.662923 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
# 2 2 1.059356, 2.723081 1.316356, 2.662923, 1.059356, 2.723081
Running nls2
using algorithm = "random-search"
and all = TRUE
and the specified maxiter
will evaluate foo
at maxiter
random points and return starting_fits
which are the fits at those points. It consists of a set of "nls"
class objects evaluated at each of the randomly chosen starting values. It does not do an optimization from each of these starting values but just returns the "nls"
object at each. That is, nls
is not run. Now for each starting fit run nlsLM
giving fits
, a list of nlsLM
fits and from that summarize them in data
(a data frame with one row per run) and show the least.
If we only want to pick the best starting value and just run nlsLM
once from that then use the alternate code near the end.
library(nls2)
fo <- y_obs ~ foo(x, a, b)
starting_fits <- nls2(fo, algorithm = "random-search",
start = start_range, control = nls.control(maxiter = reps), all = TRUE)
fits <- lapply(starting_fits, function(fit) nlsLM(fo, start = coef(fit)))
data <- data.frame(RSS = sapply(fits, deviance), t(sapply(fits, coef)),
start = t(sapply(starting_fits, coef)))
# data$fits <- fits # optional to store each row's fitted object in that row
subset(data, RSS == min(RSS)) # minimum(s)
giving:
RSS a b start.a start.b
2 706.3956 1.396616 7.226525 1.681819 2.768374
R squared is used for linear regression. It is not valid for nonlinear regression. Residual sum of squares (RSS) is shown above instead.
Alternately if you just want to pick out the best starting value and run nlsLM on that then just omit the all=TRUE
argument from the nls2
call and do this. If you need the coefficients and RSS for later code then try coef(fit)
and deviance(fit)
.
starting_fit <- nls2(fo, algorithm = "random-search",
start = start_range, control = nls.control(maxiter = reps))
fit <- nlsLM(fo, start = coef(starting_fit))
Note 1: If you are getting errors from nlsLM
try replacing nlsLM(...)
with try(nlsLM(...))
. This will issue error messages (use try(..., silent = TRUE)
if you don't want them) but will not stop processing.
Note 2: I assume that the foo
shown in the question is just an example and the real function is more complex. The foo
shown is linear in the coefficients so one could use lm
for it. No nonlinear optimization is needed.