Search code examples
rchi-squared

Using loops to do Chi-Square Test in R


I am new to R. I found the following code for doing univariate logistic regression for a set of variables. What i would like to do is run chi square test for a list of variables against the dependent variable, similar to the logistic regression code below. I found couple of them which involve creating all possible combinations of the variables, but I can't get it to work. Ideally, I want the one of the variables (X) to be the same.

Chi Square Analysis using for loop in R

lapply(c("age","sex","race","service","cancer",
         "renal","inf","cpr","sys","heart","prevad",
         "type","frac","po2","ph","pco2","bic","cre","loc"),

       function(var) {

         formula    <- as.formula(paste("status ~", var))
         res.logist <- glm(formula, data = icu, family = binomial)

         summary(res.logist)
       })

Solution

  • Are you sure that the strings in the vector you lapply over are in the column names of the icu dataset?

    It works for me when I download the icu data:

    system("wget http://course1.winona.edu/bdeppa/Biostatistics/Data%20Sets/ICU.TXT")
    icu <- read.table('ICU.TXT', header=TRUE)
    

    and change status to STA which is a column in icu. Here an example for some of your variables:

    my.list <- lapply(c("Age","Sex","Race","Ser","Can"),         
           function(var) {
             formula    <- as.formula(paste("STA ~", var))
             res.logist <- glm(formula, data = icu, family = binomial)
             summary(res.logist)
           })
    

    This gives me a list with summary.glm objects. Example:

    lapply(my.list, coefficients)
    [[1]]
                   Estimate Std. Error   z value     Pr(>|z|)
    (Intercept) -3.05851323 0.69608124 -4.393903 1.113337e-05
    Age          0.02754261 0.01056416  2.607174 9.129303e-03
    
    [[2]]
                  Estimate Std. Error    z value     Pr(>|z|)
    (Intercept) -1.4271164  0.2273030 -6.2784758 3.419081e-10
    Sex          0.1053605  0.3617088  0.2912855 7.708330e-01
    
    [[3]]
                  Estimate Std. Error    z value   Pr(>|z|)
    (Intercept) -1.0500583  0.4983146 -2.1072198 0.03509853
    Race        -0.2913384  0.4108026 -0.7091933 0.47820450
    
    [[4]]
                  Estimate Std. Error   z value     Pr(>|z|)
    (Intercept) -0.9465961  0.2310559 -4.096827 0.0000418852
    Ser         -0.9469461  0.3681954 -2.571858 0.0101154495
    
    [[5]]
                     Estimate Std. Error       z value     Pr(>|z|)
    (Intercept) -1.386294e+00  0.1863390 -7.439638e+00 1.009615e-13
    Can          7.523358e-16  0.5892555  1.276756e-15 1.000000e+00
    

    If you want to do a chi-square test:

    my.list <- lapply(c("Age","Sex","Race","Ser","Can"),function(var)chisq.test(icu$STA, icu[,var]))
    

    or a chi-square test for all combinations of variables:

    my.list.all <- apply(combn(colnames(icu), 2), 2, function(x)chisq.test(icu[,x[1]], icu[,x[2]]))
    

    Does this work?