Search code examples
rvariancegammgcv

GAM R variance explained by variable


My current problem is to calculate the variance explained by the different variables of a general additive model (GAM) with R.

I followed the explanation given by Wood here : https://stat.ethz.ch/pipermail/r-help/2007-October/142743.html

But I would like to do it with three variables. I tried this :

library(mgcv)

set.seed(0)
n<-400
x1 <- runif(n, 0, 1)
x2 <- runif(n, 0, 1) 
x3 <- runif(n, 0, 1) 

f1 <- function(x) exp(2 * x) - 3.75887
f2 <- function(x) 0.2*x^11*(10*(1-x))^6+10*(10*x)^3*(1-x)^10
f3 <- function(x) 0.008*x^2 - 1.8*x + 874
f <- f1(x1) + f2(x2) + f3(x3)
e <- rnorm(n, 0, 2)
y <- f + e

b <- gam(y ~ s(x1, k = 3)+s(x2, k = 3)+ s(x3, k = 3))
b3 <- gam(y ~ s(x1) + s(x2), sp = c(b$sp[1], b$sp[2]))
b2 <- gam(y ~ s(x1) + s(x3), sp = c(b$sp[1], b$sp[3]))
b1 <- gam(y ~ s(x2) + s(x3), sp = c(b$sp[2], b$sp[3]))

b0 <- gam(y~1)

(deviance(b1)-deviance(b))/deviance(b0)
(deviance(b2)-deviance(b))/deviance(b0)
(deviance(b3)-deviance(b))/deviance(b0)

But I don't understand results. For example, the model with only x1 and x2 has a deviance smaller than deviance with the three explanatory variable.

Does the method I used to extract variance explained by variable with three variables is correct?

Does it mean that there is a confounding effect in the global model? Or is there another explanation?

Thanks a lot.


Solution

  • You did something wrong here:

    b <- gam(y ~ s(x1, k = 3) + s(x2, k = 3) + s(x3, k = 3))
    b3 <- gam(y ~ s(x1) + s(x2), sp = c(b$sp[1], b$sp[2]))
    b2 <- gam(y ~ s(x1) + s(x3), sp = c(b$sp[1], b$sp[3]))
    b1 <- gam(y ~ s(x2) + s(x3), sp = c(b$sp[2], b$sp[3]))
    

    Why did you set k = 3 in the first line, while not setting k = 3 for the rest? Without specifying k, s() will take default value k = 10. Now you get a problem: b1, b2, b3 are not nested in b.

    In Simon Wood's original example, he left k unspecified, so that k=10 is taken for all s(). In fact, you can vary k values, but you must gurantee that you always have the same k for the same covariate (to ensure nesting). For example, you can do:

    b <- gam(y ~ s(x1, k = 4) + s(x2, k = 6) + s(x3, k = 3))
    b3 <- gam(y ~ s(x1, k = 4) + s(x2, k = 6), sp = c(b$sp[1], b$sp[2]))  ## droping s(x3) from b
    b2 <- gam(y ~ s(x1, k = 4) + s(x3, k = 3), sp = c(b$sp[1], b$sp[3]))  ## droping s(x2) from b
    b1 <- gam(y ~ s(x2, k = 6) + s(x3, k = 3), sp = c(b$sp[2], b$sp[3]))  ## droping s(x1) from b
    

    Then let's do:

    (deviance(b1)-deviance(b))/deviance(b0)
    # [1] 0.2073421
    (deviance(b2)-deviance(b))/deviance(b0)
    # [1] 0.4323154
    (deviance(b3)-deviance(b))/deviance(b0)
    # [1] 0.02094997
    

    The positive values imply that dropping any model term will inflate the deviance, which is sensible as our true model have all three terms.