Search code examples
rmodelinggam

How do you use an offset term for gamlasso? Currently I get a subscript error


I am using a gamlasso to produce smooth estimates of a variable, which works. However, an error occurs when an offset term is incorporated? Unfortunately I'm unable to share the dataset, but hopefully to code is sufficiently explanatory.

The code below works as I've hashed the offset term (and the variables inclusion in the formula which I don't think is necessary/right). However, including the offset term results in an error listed below. the offset variable is a column in the data and has datatype numeric. Logging the variable doesn't affect the datatype.


# model matrix for linear terms
gamlasso_data$X <- model.matrix(~ as.factor(time) + age_0,
                                data = gamlasso_data)[,-1]   # -1 removes default intercept column

# formula approach - age/time model
gamlassofit = gamlasso(count ~ X +
        #                      population_est + 
                               s(age, time, bs='fs'),
                       family = 'poisson`',
                       data = gamlasso_data,
                       num.knots = -1,
                       seed = 1,
        #              offset = log(gamlasso_data$population_est),
                       num.iter = 10) 
including offset term error:
Must extract column with a single valid subscript.
x Subscript `var` has the wrong type `double`.
i It must be numeric or character.

Its potentially worth mentioning that a similar method using GAMs works fine with the offset. See code below:

fitted_gam <- gam(count ~ s(age, time, bs='fs'),
                  data = gamlasso_data,
                  family = poisson,
                  offset = log(population_est),
                  method = "REML")

Please let me know if I can clarify or improve this question. Thank you.


Solution

  • As far as I can tell, the offset needs to be given as the name of a column within data. This would mean that you'd need to pre-compute the log-population and save it as log_totalpop (or something).

    For example, this works (although it may not be a sensible model ...)

    library(plsmselect)
    data(simData)
    simData$X = model.matrix(~x1+x2+x3+x4+x5+x6+x7+x8+x9, data=simData)[,-1]
    simData$p <- abs(round(simData$Yg))
    gfit = gamlasso(p ~ X +
                            s(z1, k=5, bs="ts"),
                            data = simData,
                            offset = "x10",
                            family = "poisson",
                            seed=1)