I'm running a fairly hefty gam im mgcv (R version 4.3.1 on linux, ubuntu 22.0), and i've been running across this error. here is what the gam looks like:
model_ndvi <-
gam(list(
# mean predictor
NDVI ~ cutblock +
s(long, lat, bs = 'ds', k = 100, by = cutblock) + #observe trends geographically
s(year, bs = 'cr', k = 7, by = cutblock) + #view trends over time
s(doy, bs = 'cc', k = 8, by = cutblock) + #view yearly trends
s(cb_age, bs = "cr", k = 3) + #view trends by cutblock age
ti(year, doy, bs = 'cc') + #view how yearly trends changed over time
ti(cb_age, year, bs = 'cr'), #see how the cutblock trends changed over time
# variance predictor
~ cutblock +
s(long, lat, bs = 'ds', k = 50, by = cutblock) +
s(year, bs = 'cr', k = 5, by = cutblock) +
s(doy, bs = 'cc', k = 4, by = cutblock)+
s(cb_age, bs = "cr", k = 3) +
ti(year, doy, bs = 'cc') +
ti(cb_age, year, bs = 'cr')),
family = betals(), #beta location scale distribution for the data
data = babyd,
method = 'REML',
knots = list(doy = c(0.5, 366.5)),
control = gam.control(nthreads = 30), trace = TRUE)
i've been running across the error
Error in backsolve(L, forwardsolve(t(L), (D * A)[piv, ]))[ipiv, , drop = FALSE] : incorrect number of dimensions
when trying to run the model. given that my dataset is about 87 GB large, i have been testing the model on a much smaller subset of data (about 90 MB), obtained from the original dataset using the slice
function. from what i have seen, this error has to do with subsetting data that doesn't exist. i was worried that when i extracted a smaller amount of data from the original data set, there wouldn't be enough unique values and that would cause this error, but after inspecting the data by running unique
on each variable i have seen that, as far as i know, there are more than enough unique values in each variable for the model to run.
here is a snippet of what the data looks like (using head
):
long lat NDVI cutblock harvestyr date dec_date year doy cb_age
1 -127.2368 51.60493 -0.0275 FALSE NA 2000-02-18 2000.131 2000 49 NA
2 -127.2368 51.60712 -0.0053 FALSE NA 2000-02-18 2000.131 2000 49 NA
3 -127.2368 51.60932 0.1173 FALSE NA 2000-02-18 2000.131 2000 49 NA
4 -127.2368 51.61151 0.2855 FALSE NA 2000-02-18 2000.131 2000 49 NA
5 -127.2368 51.61371 0.2417 FALSE NA 2000-02-18 2000.131 2000 49 NA
6 -127.2368 51.61591 0.4240 FALSE NA 2000-02-18 2000.131 2000 49 NA
and here is what it looks like when i run str
on the data:
'data.frame': 1474926 obs. of 10 variables:
$ long : num -127 -127 -127 -127 -127 ...
$ lat : num 51.6 54 53.4 52.9 52.1 ...
$ NDVI : num -0.0275 0.4184 -0.0103 -0.0292 0.5711 ...
$ cutblock : Factor w/ 2 levels "FALSE","TRUE": 1 1 1 1 1 1 1 1 1 1 ...
$ harvestyr: num NA NA NA NA NA NA NA NA NA NA ...
$ date : Date, format: "2000-02-18" "2000-02-18" "2000-02-18" "2000-02-18" ...
$ dec_date : num 2000 2000 2000 2000 2000 ...
$ year : int 2000 2000 2000 2000 2000 2000 2000 2000 2000 2000 ...
$ doy : int 49 49 49 49 49 49 49 49 49 49 ...
$ cb_age : int NA NA NA NA NA NA NA NA NA NA ...
for a bit more context about the model in case anyone needs more information - i'm looking at ndvi trends over the past 20 or so years within and outside of clearcut areas in forests (called cutblocks in the data) and trying to determine at what point these clearcut areas recover back to full productivity, taking into account the changes in both mean and variance. the betals
function is a beta location scale distribution function, written for this project by simon wood (author of mgcv
).
i've tried looking at the structure of the data to see if that is the issue, but all information online is very simple in relation to what i am trying to do. i'm wondering if anyone has come across this error before in mgcv! any help is appreciated :))
the solution to this ended up being quite simple: the location scale family wasn't automatically selecting the minimum possible number of knots for one of the terms. while i specified k = 3
, the minimum was 5, so changing it fixed the error.
this seems to be an issue for other location scale distribution families in mgcv as well - we tested the model using gaulss
and had the same error, but changing the knots fixed it.