Search code examples
rfunctionnlmemgcv

Error including correlation structure in function with gamm


I am trying to create my own function that contains 1.) the mgcv gamm function and 2.) a nested autocorrelation (ARMA) argument. I am getting an error when I try to run the function like this:

df <- AirPassengers
df <- as.data.frame(df)
df$month <- rep(1:12)
df$yr <- rep(1949:1960,each=12)
df$datediff <- 1:nrow(df)

try_fxn1 <- function(dfz, colz){gamm(dfz[[colz]] ~ s(month, bs="cc",k=12)+s(datediff,bs="ts",k=20), data=dfz,correlation = corARMA(form = ~ 1|yr, p=2))}

try_fxn1(df,"x")

Error in eval(predvars, data, env) : object 'dfz' not found

I know the issue is with the correlation portion of the formula, as when I run the same function without the correlation structure included (as seen below), the function behaves as expected.

try_fxn2 <- function(dfz, colz){gamm(dfz[[colz]] ~ s(month, bs="cc",k=12)+ s(datediff,bs="ts",k=20), data=dfz)}

try_fxn2(df,"x")

Any ideas on how I can modify try_fxn1 to make the function behave as expected? Thank you!


Solution

  • You are confusing a vector with the symbolic representation of that vector when building a formula.

    You don't want dfz[[colz]] as the response in the formula, you want x or whatever you set colz too. What you are getting is

    dfz[[colz]] ~ ...
    

    when what you really want is the variable colz:

    colz ~ ...
    

    And you don't want a literal colz but whatever colz evaluates to. To do this you can create a formula by pasting the parts together:

    fml <- paste(colz, '~ s(month, bs="cc", k=12) + s(datediff,bs="ts",k=20)')
    

    This turns colz into whatever it was storing, not the literal colz:

    > fml
    [1] "x ~ s(month, bs=\"cc\", k=12) + s(datediff,bs=\"ts\",k=20)"
    

    Then convert the string into a formula object using formula() or as.formula().

    The final solution then is:

    fit_fun <- function(dfz, colz) {
        fml <- paste(colz, '~ s(month, bs="cc", k=12) + s(datediff,bs="ts",k=20)')
        fml <- formula(fml)
        gamm(fml, data = df, correlation = corARMA(form = ~ 1|yr, p=2))
    }
    

    This really is not an issue with corARMA() part, other than that triggers somewhat different evaluation code for the formula. The guiding mantra here is to always get a formula as you would type it if not programming with formulas. You would never (or should never) write a formula like

    gamm(df[[var]] ~ x + s(z), ....)
    

    While this might work in some settings, it will fail miserably if you ever want to use predict()` and it fails when you have to do something a little more complicated.