Search code examples
rlogistic-regression

How to add combinations of variables into a regression


I'm trying to write a code that will present regression's coefficients for mixed combination of variables.After each regression another variable should be added to the regression so In the end I'll get a list that present the coefficients per a regrssion.I would like to allow only 1-4 variables in each regression. Here is my sample data:

dat <- read.table(text = " var1 var2    var3     var4
    0        3        9         7
    1        3        8         4
    1        1        2         8
    0        1        2         3
    0        1        8         3
    1        6        1         2
    0        6        7         1
    1        6        1         5
    0        5        9         7
    1        3        8         7
    1        4        2         7
    0        1        2         3
    0        7        6         3
    1        6        1         1
    0        6        3         9
    1        6        1         1   ",header = TRUE)

I manage to get a list that show the Coeffients of the regression between a spesific variable (var1) and the other variables by using this code:

t(sapply(setdiff(names(dat),"var1"),
              function(x) coef(glm(reformulate(x,response="var1"),
                                   data=dat,family=binomial(link='logit')))))

Here is the output:

     (Intercept)        var2
var2 -0.56394149  0.13865097
var3  1.28295290 -0.29798823
var4  0.08075091 -0.01819781

However, I would like to add regressions that use combinations of variables and present the results in a table along with it's p-value, for example:

           (Intercept)        var2 var3 var4  p-value
var2       -0.56394149  0.13865097            0.02
var3        1.28295290 -0.29798823            0.01
var4        0.08075091 -0.01819781            0.2
var2+var3 
var2+var4
var3+var4
var2+var3+var4

Any Idea how can It be done? Thank you


Solution

  • You could use MuMIn for this:

    mod <- glm(var1 ~ var2 + var3 + var4, data = dat, na.action = na.fail)
    # Nullmodel
    mod0 <- glm(var1 ~ 1, data = dat, na.action = na.fail)
    require(MuMIn)
    allmods <- dredge(mod, extra = c(pval = function(x) anova(x, mod0, test = 'F')[2, 'Pr(>F)']))
    allmods
    

    The p-value comes from a comparison with the nullmodel.