I'm running a GAM using the mgcv package. Here is my model code. I'm using method ="ML"
as I'm hoping to use the MuMIn package and dredge()
for model selection on this model, and you can't use REML for model comparison.
mDEG_nb <- bam(deg ~ s(SE_score) + s(ri) + TL + species + sex + season + year +
s(code, bs = 're') + s(station, bs = 're'),
family=nb(), data=node_dat, method = "ML")
th1 <- mDEG_nb$family$getTheta(TRUE) #extracts theta
mDEG_nb <- bam(deg ~ s(SE_score) + s(r)i + TL + species + sex + season + year +
s(code, bs = 're') + s(station, bs = 're'),
family=nb(th1), data=node_dat, method = "ML", na.action = "na.fail")
However using gam.check()
I can see that I have an issue with the residuals with one of the smooths.
Method: ML Optimizer: outer newton
full convergence after 7 iterations.
Gradient range [-0.001295518,0.002610357]
(score 82388.55 & scale 1).
Hessian positive definite, eigenvalue range [0.001293007,29.20119].
Model rank = 224 / 224
Basis dimension (k) checking results. Low p-value (k-index<1) may
indicate that k is too low, especially if edf is close to k'.
k' edf k-index p-value
s(SE_score) 9.00 1.00 0.90 0.59
s(ri) 9.00 8.98 0.34 <2e-16 ***
s(code) 134.00 100.15 NA NA
s(station) 63.00 54.46 NA NA
To deal with this, I tried to increase the value of K, however I keep getting this error indicating there's a problem issue estimating theta when I use values of K greater than 12.
Warning messages:
1: In estimate.theta(theta, family, G$y, linkinv(eta), scale = scale1, :
step failure in theta estimation
2: In estimate.theta(theta, family, G$y, linkinv(eta), scale = scale1, :
step failure in theta estimation
In addition I still have the residual issue no matter how much I increase K by. But that could be due to the error not calculating a new value of theta.
Is this warning an issue? And how can I change my model so that the residuals of this smoother are no longer a problem? If the residuals are not random, how does this impact my model results? I assume they are not to be trusted?
An example of my dataset is below should you need it.
> dput(head(node_dat))
structure(list(station = structure(c(17L, 49L, 23L, 25L, 25L,
9L), .Label = c("BE01", "BE02", "BEUWM01", "BL01", "BL02", "GCB01",
"GCB02", "GCB03", "NI01", "NI01b", "NI03", "PB01", "PB02", "PB03",
"PB04", "PB05", "PB06", "PB07", "PB08", "PB09", "PB10", "PB11",
"PB12", "PB13", "PB14", "PB15", "PB16", "PB17", "PB18", "PB19",
"PB20", "PB21", "PB22", "PB23", "PB24", "PB25", "PB26", "PB27",
"PB28", "PB29", "PB30", "PB4G01", "PB4G02", "PBUWM01", "PBUWM02",
"SA01", "SA02", "SA02b", "SA03", "SA04", "SA05", "SA06", "SA07",
"SA11", "SAUWM01", "SB01", "SB02/AR02", "SB03/AR05", "SB04/AR06",
"VB01", "VB02", "VB03", "VB04"), class = "factor"), monthyear = structure(c(27L,
17L, 38L, 4L, 19L, 29L), .Label = c("2014/01", "2014/02", "2014/03",
"2014/04", "2014/05", "2014/06", "2014/07", "2014/08", "2014/09",
"2014/10", "2014/11", "2014/12", "2015/01", "2015/02", "2015/03",
"2015/04", "2015/05", "2015/06", "2015/07", "2015/08", "2015/09",
"2015/10", "2015/11", "2015/12", "2016/01", "2016/02", "2016/03",
"2016/04", "2016/05", "2016/06", "2016/07", "2016/08", "2016/09",
"2016/10", "2016/11", "2016/12", "2017/01", "2017/02", "2017/03",
"2017/04", "2017/05", "2017/06", "2017/07", "2017/08", "2017/09",
"2017/10", "2017/11", "2017/12", "2018/01", "2018/02", "2018/03",
"2018/04", "2018/05", "2018/06", "2018/07", "2018/08", "2018/09",
"2018/10", "2018/11", "2018/12"), class = "factor"), code = structure(c(99L,
204L, 183L, 146L, 4L, 135L), .Label = c("2390", "13573", "13574",
"13575", "13576", "19318", "19319", "19321", "19322", "19506",
"19514", "19519", "19520", "19524", "25537", "25540", "25541",
"25543", "25546", "25549", "25552", "25553", "27583", "27585",
"27586", "27591", "27592", "27593", "27594", "27595", "27596",
"27597", "27600", "27601", "27605", "27607", "27608", "27613",
"27614", "27617", "27619", "27620", "27621", "27626", "27627",
"27629", "27630", "27631", "27632", "28608", "28611", "28612",
"28618", "28625", "28628", "28629", "28631", "28632", "28633",
"28638", "28641", "28644", "28662", "28672", "28674", "52978",
"54815", "54846", "54852", "54860", "54863", "54865", "54866",
"54868", "54877", "54882", "54883", "54884", "54886", "54890",
"54892", "54895", "54896", "54901", "54904", "54914", "54919",
"54920", "54922", "54925", "54931", "54932", "54938", "54952",
"54954", "54955", "54958", "54959", "54962", "59950", "59953",
"59954", "59955", "59957", "59958", "59959", "59960", "59961",
"59962", "59964", "59966", "59969", "59970", "59971", "59972",
"59973", "59975", "59976", "59979", "59981", "59988", "2388",
"12950", "12952", "12956", "12958", "12960", "12962", "12964",
"12966", "12968", "13577", "14203", "19320", "19523", "25534",
"25535", "25536", "25539", "25542", "25544", "25545", "25547",
"25548", "25550", "27584", "27588", "27589", "27590", "27598",
"27599", "27602", "27603", "27604", "27606", "27609", "27610",
"27611", "27615", "27616", "27618", "27622", "27624", "27625",
"28624", "28627", "28637", "28639", "28642", "28660", "28670",
"34176", "34177", "34178", "34179", "52975", "52977", "54817",
"54821", "54822", "54825", "54845", "54849", "54880", "54887",
"54889", "54893", "54898", "54899", "54905", "54911", "54912",
"54915", "54933", "54947", "54957", "54961", "59951", "59963",
"59968", "59978", "59991", "59992", "59993", "59994", "59995"
), class = "factor"), species = structure(c(1L, 2L, 2L, 2L, 1L,
2L), .Label = c("Grey Reef Shark", "Silvertip Shark"), class = "factor"),
deg = c(0, 0, 0, 0, 0, 0), gs = c(0, 0, 0, 0, 0, 0), btw = c(0,
0, 0, 0, 0, 0), ud = c(0, 0, 0, 0, 0, 0), ri = c(0, 0, 0,
0, 0, 0), SE_score = c(0.35, 0.39, 0.18, 0.23, 0.36, 0.42
), region = structure(c(5L, 6L, 5L, 5L, 5L, 4L), .Label = c("Benares",
"Blenheim", "Grand Chagos Bank", "Nelsons Island", "Peros Banhos",
"Saloman", "Speakers Bank", "Victory Bank"), class = "factor"),
date = structure(c(1456790400, 1430434800, 1485907200, 1396306800,
1435705200, 1462057200), class = c("POSIXct", "POSIXt"), tzone = ""),
month = c(3, 5, 2, 4, 7, 5), season = structure(c(2L, 1L,
2L, 1L, 1L, 1L), .Label = c("dry.season", "wet.season"), class = "factor"),
year = structure(c(3L, 2L, 4L, 1L, 2L, 3L), .Label = c("2014",
"2015", "2016", "2017", "2018"), class = "factor"), sex = structure(c(1L,
2L, 2L, 1L, 1L, 2L), .Label = c("F", "M"), class = "factor"),
TL = c(117, 157, 137, 108, 94, 137), TL_stand = c(0.353383458646617,
0.654135338345865, 0.503759398496241, 0.285714285714286,
0.180451127819549, 0.503759398496241), btw_stand = c(0, 0,
0, 0, 0, 0)), na.action = structure(c(`59` = 59L, `91` = 91L,
`119` = 119L, `144` = 144L, `715` = 715L, `754` = 754L, `780` = 780L,
`803` = 803L, `2116` = 2116L, `2452` = 2452L, `2489` = 2489L,
`2504` = 2504L, `2544` = 2544L, `3070` = 3070L, `3092` = 3092L,
`3126` = 3126L, `3151` = 3151L, `4464` = 4464L, `4800` = 4800L,
`4842` = 4842L, `4862` = 4862L, `4893` = 4893L, `6181` = 6181L,
`8221` = 8221L, `10073` = 10073L, `11232` = 11232L, `11603` = 11603L,
`11639` = 11639L, `11663` = 11663L, `11688` = 11688L, `12266` = 12266L,
`12288` = 12288L, `12322` = 12322L, `12347` = 12347L, `13660` = 13660L,
`14023` = 14023L, `14045` = 14045L, `14075` = 14075L, `14104` = 14104L,
`15417` = 15417L, `15780` = 15780L, `15795` = 15795L, `15837` = 15837L,
`15877` = 15877L, `17138` = 17138L, `17164` = 17164L, `17194` = 17194L,
`17219` = 17219L, `18532` = 18532L, `18895` = 18895L, `18917` = 18917L,
`18951` = 18951L, `18976` = 18976L, `20289` = 20289L, `20652` = 20652L,
`20674` = 20674L, `20704` = 20704L, `20729` = 20729L, `22055` = 22055L,
`22409` = 22409L, `22435` = 22435L, `22461` = 22461L, `22490` = 22490L,
`23803` = 23803L, `24166` = 24166L, `24188` = 24188L, `24218` = 24218L,
`24247` = 24247L, `25560` = 25560L, `25919` = 25919L, `25939` = 25939L,
`25976` = 25976L, `25996` = 25996L, `27308` = 27308L, `27330` = 27330L,
`27360` = 27360L, `27385` = 27385L, `28702` = 28702L, `29065` = 29065L,
`29087` = 29087L, `29121` = 29121L, `29146` = 29146L, `30459` = 30459L,
`30822` = 30822L, `30844` = 30844L, `30874` = 30874L, `30903` = 30903L,
`32216` = 32216L, `32579` = 32579L, `32605` = 32605L, `32631` = 32631L,
`32660` = 32660L, `33973` = 33973L, `34336` = 34336L, `34369` = 34369L,
`34397` = 34397L, `34415` = 34415L, `34875` = 34875L, `34901` = 34901L,
`34931` = 34931L, `34956` = 34956L, `36269` = 36269L, `36632` = 36632L,
`36658` = 36658L, `36684` = 36684L, `36709` = 36709L, `38026` = 38026L,
`38389` = 38389L, `38415` = 38415L, `38441` = 38441L, `38466` = 38466L,
`39783` = 39783L, `40146` = 40146L, `40168` = 40168L, `40198` = 40198L,
`40223` = 40223L, `41540` = 41540L, `41914` = 41914L, `41937` = 41937L,
`41960` = 41960L, `41984` = 41984L, `43297` = 43297L, `43633` = 43633L,
`43675` = 43675L, `43695` = 43695L, `43730` = 43730L, `45014` = 45014L,
`45363` = 45363L, `45400` = 45400L, `45415` = 45415L, `45455` = 45455L,
`45954` = 45954L, `45991` = 45991L, `46009` = 46009L, `46048` = 46048L,
`46541` = 46541L, `46576` = 46576L, `46602` = 46602L, `46627` = 46627L,
`46817` = 46817L, `46859` = 46859L, `46879` = 46879L, `46910` = 46910L,
`48198` = 48198L, `48547` = 48547L, `48589` = 48589L, `48609` = 48609L,
`48640` = 48640L, `49928` = 49928L, `50277` = 50277L, `50319` = 50319L,
`50345` = 50345L, `50370` = 50370L, `51658` = 51658L, `52007` = 52007L,
`52048` = 52048L, `52069` = 52069L, `52100` = 52100L, `53388` = 53388L,
`53737` = 53737L, `53778` = 53778L, `53799` = 53799L, `53830` = 53830L,
`55118` = 55118L, `55467` = 55467L, `55508` = 55508L, `55529` = 55529L,
`55560` = 55560L, `56848` = 56848L, `57197` = 57197L, `57238` = 57238L,
`57264` = 57264L, `57295` = 57295L, `58555` = 58555L, `58596` = 58596L,
`58617` = 58617L, `58648` = 58648L, `59936` = 59936L, `60285` = 60285L,
`60322` = 60322L, `60337` = 60337L, `60377` = 60377L, `60875` = 60875L,
`60905` = 60905L, `60931` = 60931L, `60972` = 60972L, `61441` = 61441L,
`61463` = 61463L, `61497` = 61497L, `61522` = 61522L, `62835` = 62835L,
`63197` = 63197L, `63236` = 63236L, `63260` = 63260L, `63276` = 63276L,
`63793` = 63793L, `64180` = 64180L, `64206` = 64206L, `64232` = 64232L,
`64261` = 64261L, `65574` = 65574L, `65937` = 65937L, `65959` = 65959L,
`65993` = 65993L, `66018` = 66018L, `67331` = 67331L, `67694` = 67694L,
`67716` = 67716L, `67746` = 67746L, `67772` = 67772L, `69088` = 69088L,
`69424` = 69424L, `69466` = 69466L, `69486` = 69486L, `69517` = 69517L,
`70805` = 70805L, `72253` = 72253L, `73419` = 73419L, `73760` = 73760L,
`73802` = 73802L, `73828` = 73828L, `73859` = 73859L, `75141` = 75141L,
`76590` = 76590L, `77752` = 77752L, `78906` = 78906L, `79486` = 79486L,
`79523` = 79523L, `79536` = 79536L, `79556` = 79556L, `80122` = 80122L,
`80159` = 80159L, `80174` = 80174L, `80214` = 80214L, `82125` = 82125L
), class = "omit"), row.names = c(31669L, 80335L, 63799L, 59674L,
1051L, 51949L), class = "data.frame")
After a bit of help from Simon Wood, I found an answer to removing residual errors. The issue was basically that ri=0 guarantees that deg=0, but to match that with a smooth requires a very sharp upswing from ri=0 to ri>0. As such you can use an adaptive spline, s(ri,bs="ad")
, to deal with this upswing.