Significance of fixed effects in negative binomial mixed model using mgcv gam

I am using gam from the mgcv package to analyze a dataset with 24 entries :

ran  f1     f2   y
1   3000    5   545
1   3000    10  1045
1   10000   5   536
1   10000   10  770
2   3000    5   842
2   3000    10  2042
2   10000   5   615
2   10000   10  1361
3   3000    5   328
3   3000    10  1028
3   10000   5   262
3   10000   10  722
4   3000    5   349
4   3000    10  665
4   10000   5   255
4   10000   10  470
5   3000    5   680
5   3000    10  1510
5   10000   5   499
5   10000   10  1422
6   3000    5   628
6   3000    10  2062
6   10000   5   499
6   10000   10  2158

The data has two fixed effects (f1 and f2) and one random effect (ran). The dependent data is y. Because the dependent data y represents counts and is overdispersed, I am using a negative binomial model.

The gam model and its summary output is as follows:

library(mgcv)
summary(gam(y ~ f1 * f2 + s(ran, bs = "re"), data = df2, family = nb, method = "REML"))

Family: Negative Binomial(27.376) 
Link function: log 

Formula:
y ~ f1 * f2 + s(ran, bs = "re")

Parametric coefficients:
              Estimate Std. Error z value Pr(>|z|)    
(Intercept)  5.500e+00  3.137e-01  17.533  < 2e-16 ***
f1          -3.421e-05  3.619e-05  -0.945    0.345    
f2           1.760e-01  3.355e-02   5.247 1.55e-07 ***
f1:f2        2.665e-07  4.554e-06   0.059    0.953    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Approximate significance of smooth terms:
         edf Ref.df Chi.sq p-value    
s(ran) 4.726      5  85.66  <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

R-sq.(adj) =  0.866   Deviance explained = 93.6%
-REML = 185.96  Scale est. = 1         n = 24

The Wald test from summary gives very high significance for f2 (P = 1.55e-07). However, when I test the significance of f2 by comparing two different models using anova, I get dramatically different results:

anova(gam(y ~ f1 * f2 + s(ran, bs = "re"), data = df2, family = nb, method = "ML"),
gam(y ~ f1 + s(ran, bs = "re"), data = df2, family = nb, method = "ML"),
test="Chisq")

Analysis of Deviance Table

Model 1: y ~ f1 * f2 + s(ran, bs = "re")
Model 2: y ~ f1 + s(ran, bs = "re")
  Resid. Df Resid. Dev      Df Deviance Pr(>Chi)
1    14.843     18.340                          
2    16.652     21.529 -1.8091   -3.188   0.1752

f2 is no longer significant. The models were changed from REML to ML, as recommended for evaluation of fixed effects.

If the interaction is preserved, f2 still remains insignificant using anova:

anova(gam(y ~ f1 + f2 + f1:f2 + s(ran, bs = "re"), data = df2, family = nb, method = "ML"),
gam(y ~ f1 + f1:f2 + s(ran, bs = "re"), data = df2, family = nb, method = "ML"),
test="Chisq")
Analysis of Deviance Table

Model 1: y ~ f1 + f2 + f1:f2 + s(ran, bs = "re")
Model 2: y ~ f1 + f1:f2 + s(ran, bs = "re")
  Resid. Df Resid. Dev       Df Deviance Pr(>Chi)
1    14.843     18.340                           
2    15.645     19.194 -0.80159 -0.85391   0.2855

I would be very grateful for advice on which of these approaches is more appropriate. Many thanks!

Solution

The WARNING section of ?anova.gam says:

If models a and b differ only in terms with no un-penalized components (such as random effects) then p values from anova(a,b) are unreliable, and usually much too low.

I'd guess that the p-value is unreliable but in this instance you have a situation where the opposite case to that expected is observed - p-values are much larger.

However, I think you aren't comparing the right models. Unless you know what you are doing, the principle of marginality should be observed when comparing models with interactions.

So I would compare a model with main effects of f1 and f2 with a model that included these main effect and their interaction.

model 1: y ~ f1 * f2 + s(ran, bs = "re")
model 2: y ~ f1 + f2 + s(ran, bs = "re")

Unless there is somethign you aren't telling us about how your models are setup, you shouldn't include a higher-order term without including the lower order terms that are found in the higher order term. For example, you have f1 + f1:f2 and f2 is found in the second order term but is not found as a first order term in the model.