Extremly high deviance explained but no significant predictors in mgcv()

I have a small (n=28) dataset of three seabird occurrence count data and have run a hurdel GAM model (using mgcv::gam()) with first using a binary model with presence/absecene, and then a negative binomial model with just the presence. The presence model brings the sample size to 12,12, and 22 for each seabird. the seabird data are also overdispered with high zeros and often low (< 10) occurrence at presence points. this is the models for each seabird: #three seabirds; prion, storm petrel, sooty shearwater

prion_binary <- mgcv::gam(prion_binary ~ s(avg_SST) + 
                            s(avg_SSS)+ 
                            s(delta_SST)+
                            s(delta_SSS)+
                            s(distance, k=8)+ # 9 different distances
                            s(total_zp)+ #total zooplankton
                            s(trip_factor,bs = "re"),
                          method = "ML",
                          family = binomial(link = "logit"), 
                          data = seabird)
prion_count <- mgcv::gam(prion ~ s(avg_SST) + 
                           s(avg_SSS)+ 
                           s(delta_SST)+
                           s(delta_SSS)+
                           s(distance, k=5)+ # 6 different distances
                           s(total_zp)+ #total zooplankton
                           s(trip_factor,bs = "re"), 
                         method = "ML",
                         family = "ziP", 
                         data = seabird[seabird$prion >0,])

My issue is the output for the models shows very high devience explained and relativly no significant predictors. In one case the r2 was also negative. I think I likely have too many predictors but when I run univariate models all are coming out with deviance explained and p<0.05 so not sure which to remove. The residual plots also don't aline with such high dev explined.

not sure where to go next so any help would be appreciated.

this is the output from the three seabird models:

prion_binary

Family: binomial 
Link function: logit 

Formula:
prion_binary ~ s(avg_SST) + s(avg_SSS) + s(delta_SST) + s(delta_SSS) + 
    s(distance, k = 8) + s(total_zp) + s(trip_factor, bs = "re")

Parametric coefficients:
            Estimate Std. Error z value Pr(>|z|)
(Intercept)   -2.557      4.021  -0.636    0.525

Approximate significance of smooth terms:
                 edf Ref.df Chi.sq p-value
s(avg_SST)     1.000  1.000  0.687   0.407
s(avg_SSS)     1.000  1.000  0.324   0.569
s(delta_SST)   1.000  1.000  0.282   0.596
s(delta_SSS)   1.000  1.000  0.440   0.507
s(distance)    1.000  1.000  0.963   0.326
s(total_zp)    1.742  2.051  0.782   0.736
s(trip_factor) 1.349  3.000  3.615   0.120

R-sq.(adj) =  0.995   Deviance explained = 97.5%
-ML = 6.8535  Scale est. = 1         n = 28

prion binary residuals

prion count:

Family: Negative Binomial(2277108.965) 
Link function: log 

Formula:
prion ~ s(avg_SST) + s(avg_SSS) + s(delta_SST) + s(delta_SSS) + 
    s(distance, k = 5) + s(total_zp) + s(trip_factor, bs = "re")

Parametric coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)   0.8218     0.1979   4.153 3.29e-05 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Approximate significance of smooth terms:
                     edf Ref.df Chi.sq p-value
s(avg_SST)     1.000e+00      1  0.017   0.896
s(avg_SSS)     1.000e+00      1  0.675   0.411
s(delta_SST)   1.000e+00      1  0.085   0.771
s(delta_SSS)   1.000e+00      1  0.148   0.700
s(distance)    1.000e+00      1  0.727   0.394
s(total_zp)    1.000e+00      1  0.059   0.809
s(trip_factor) 1.016e-07      2  0.000   0.508

R-sq.(adj) =  -0.177   Deviance explained = 55.1%
-ML = 17.514  Scale est. = 1         n = 12

prion count residuals

sooty shearwater binary

Family: binomial 
Link function: logit 

    Formula:
    shearwater_binary ~ s(avg_SST) + s(avg_SSS) + s(delta_SST) + 
        s(delta_SSS, k = 15) + s(distance, k = 8) + s(total_zp) + 
        s(trip_factor, bs = "re")
    
    Parametric coefficients:
                Estimate Std. Error z value Pr(>|z|)
    (Intercept)    3.611      2.656    1.36    0.174
    
    Approximate significance of smooth terms:
                     edf Ref.df Chi.sq p-value   
    s(avg_SST)     1.000      1  0.917 0.33829   
    s(avg_SSS)     1.000      1  0.914 0.33915   
    s(delta_SST)   1.000      1  0.000 0.99210   
    s(delta_SSS)   1.000      1  0.017 0.89504   
    s(distance)    1.000      1  0.004 0.94848   
    s(total_zp)    1.000      1  0.113 0.73652   
    s(trip_factor) 1.141      3 11.683 0.00514 **
    ---
    Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
    
    R-sq.(adj) =  0.472   Deviance explained = 59.4%
    -ML = 9.4597  Scale est. = 1         n = 28

sooty binary residuals

shearwater count

 Family: Negative Binomial(6642419.022) 
Link function: log 

Formula:
shearwater ~ s(avg_SST) + s(avg_SSS) + s(delta_SST) + s(delta_SSS) + 
    s(distance, k = 8) + s(total_zp) + s(trip_factor, bs = "re")

Parametric coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)   1.9002     0.1211    15.7   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Approximate significance of smooth terms:
                     edf Ref.df  Chi.sq  p-value    
s(avg_SST)     1.000e+00  1.000   1.841   0.1748    
s(avg_SSS)     3.606e+00  4.272 125.264  < 2e-16 ***
s(delta_SST)   1.000e+00  1.000   5.619   0.0178 *  
s(delta_SSS)   1.000e+00  1.000  18.094 2.11e-05 ***
s(distance)    4.657e+00  5.328 277.393  < 2e-16 ***
s(total_zp)    1.000e+00  1.000  10.490   0.0012 ** 
s(trip_factor) 9.002e-07  3.000   0.000   0.4375    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

R-sq.(adj) =  0.999   Deviance explained = 99.5%
-ML = 67.138  Scale est. = 1         n = 22

sooty count residuals

storm petrel binary

 Family: binomial 
Link function: logit 

Formula:
storm_petrel_binary ~ s(avg_SST) + s(avg_SSS) + s(delta_SST) + 
    s(delta_SSS) + s(distance, k = 8) + s(total_zp) + s(trip_factor, 
    bs = "re")

Parametric coefficients:
            Estimate Std. Error z value Pr(>|z|)
(Intercept)  -0.7884     0.9928  -0.794    0.427

Approximate significance of smooth terms:
                     edf Ref.df Chi.sq p-value
s(avg_SST)     1.000e+00   1.00  0.174   0.676
s(avg_SSS)     1.000e+00   1.00  0.190   0.663
s(delta_SST)   1.000e+00   1.00  1.038   0.308
s(delta_SSS)   1.000e+00   1.00  0.000   0.996
s(distance)    3.003e+00   3.69  5.302   0.213
s(total_zp)    1.000e+00   1.00  0.039   0.844
s(trip_factor) 5.115e-07   3.00  0.000   0.369

R-sq.(adj) =  0.595   Deviance explained = 64.9%
-ML = 12.629  Scale est. = 1         n = 28

storm petrel binary residuals

storm peterl count

Family: Negative Binomial(1572380.699) 
Link function: log 

Formula:
storm_petrel ~ s(avg_SST) + s(avg_SSS) + s(delta_SST) + s(delta_SSS) + 
    s(distance, k = 5) + s(total_zp) + s(trip_factor, bs = "re")

Parametric coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)   0.8340     0.2065   4.039 5.36e-05 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Approximate significance of smooth terms:
                     edf Ref.df Chi.sq p-value  
s(avg_SST)     1.000e+00      1  2.654  0.1033  
s(avg_SSS)     1.000e+00      1  0.861  0.3535  
s(delta_SST)   1.000e+00      1  1.389  0.2386  
s(delta_SSS)   1.000e+00      1  4.626  0.0315 *
s(distance)    1.000e+00      1  0.562  0.4534  
s(total_zp)    1.000e+00      1  0.580  0.4463  
s(trip_factor) 1.018e-07      2  0.000  0.2196  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

R-sq.(adj) =  0.647   Deviance explained = 73.8%
-ML = 19.901  Scale est. = 1         n = 12

storm petrel count residuals

Solution

What I'm guessing is going on (but can't know without seeing the data) is that your explanatory variables are highly correlated with each other. The significance of each variable is calculated based on how much additional variance is explained when you add that variable to a reduced model with all the variables except that one. So if your explanatory variables are collinear, adding another one isn't going to explain much variance that the others haven't.

Also, definitely too many predictors for the data you have. That could, quite possibly, be the sole reason your explained deviance is so high. For only 12 data, you probably don't want more than one or two predictors (though read elsewhere for other opinions).

One possible way forward would be to do a principal component analysis of your explanatory variables, or of a subset of your explanatory variables that would naturally group together. If one or two principal components explain a large proportion of the variance in your explanatory variables, then use those principal components as your predictors instead.

Another possibility would be to jettison any predictors that seem less important a priori (emphasis on the a priori part).

Also, you will probably get better answers than this on Stats.SE.