Search code examples
rregressionnon-linear-regressionnls

non-numeric argument to binary operator with nls


I'd like to fit a non-linear model using nls, but I'm getting a non-numeric argument to binary operator error:

df <- data.frame(task=rep(c("x", "y", "z"), each=6), time=rep(c(1:18)), y=c(1, 5, 9, 14, 17, 29, 2, 3, 1, 5, 2, 3, 29, 21, 18, 16, 14, 5))

OLS <- lm(y~ A + time*task, data=df)

I'm trying to get a similar result with the nls function:

NLS <- nls(y ~ A + B*time*task, data=df, start=list(A=1,B=1))

Error in B * time * task : non-numeric argument to binary operator

It seems I'm not using nls the way it should be used, so what is it that am I doing wrong?


Solution

  • tl;dr because you can't multiply a factor variable (task) by a numeric variable (time).

    If you want to use nls to fit linear models (for whatever reason) you need to replicate the things that lm() is doing under the hood. In particular, in R formula language, time*task means "effect of time, effect of task, and their interaction". If both variables were numeric, then the interaction variable is just the regular product of the two variables, but if at least one is numeric, something more complicated happens. The simplest way to replicate is to use model.matrix():

    X <- as.data.frame(model.matrix(~time*task, data =df))
    names(X)
    ## "(Intercept)" "time"        "tasky"       "taskz"       "time:tasky" "time:taskz" 
    X$y <- df$y ## add response variable (slight abuse of notation)
    

    Because task has three levels, we need a total of five derived variables (plus the intercept) to fit the full model.

    NLS <- nls(y ~ A + b1*time + b2*tasky + b3*taskz + b4*`time:tasky` + b5*`time:taskz`,
               data=X, start=list(A=1,b1 = 0, b2 = 0, b3 = 0, b4 = 0, b5 = 0))
    coef(NLS)
    ##         A        b1        b2        b3        b4        b5 
    ## -5.600000  5.171429  6.638095 86.095238 -5.000000 -9.257143 
    

    Compare with OLS:

    OLS <- lm(y~ 1 + time*task, data=df)  ## 'A' doesn't work
    coef(OLS)
    ## (Intercept)        time       tasky       taskz  time:tasky  time:taskz 
    ##   -5.600000    5.171429    6.638095   86.095238   -5.000000   -9.257143 
    

    You can also embed a linear submodel in a nonlinear model using the params argument of nlme::gnls:

    GNLS <- nlme::gnls(y ~ mu, data = df, 
                       params = list(mu ~ 1 + time*task),
                       start = list(mu = rep(0,6)))
    

    Same coefficients.