Search code examples
rloopsapplyglm

glm for multiple variables in R


I wanted to model my snps array. I can do this one by one using the following code.

Data$DX=as.factor(Data$DX)
univariate=glm(relevel(DX, "CON") ~ relevel(rs6693065_D,"AA"), family = binomial, data = Data)
summary(univariate)
exp(cbind(OR = coef(univariate), confint(univariate)))

How can I do this for all other snps using a loop or apply? The snps are rs6693065_D, rs6693065_A and hundreds of them. From the above code only "rs6693065_D" will be replaced by all other snps. Best Regards Zillur


Solution

  • Consider developing a generalized method to handle any snps. Then call it iteratively passing every snps column using lapply or sapply:

    # GENERALIZED METHOD
    proc_glm <- function(snps) {
       univariate <- glm(relevel(data$DX, "CON") ~ relevel(snps, "AA"), family = binomial)
    
       return(exp(cbind(OR = coef(univariate), confint(univariate))))
    }
    
    # BUILD LIST OF FUNCTION OUTPUT 
    glm_list <- lapply(Data[3:426], proc_glm)
    

    Use tryCatch in case of errors like relevel:

    # BUILD LIST OF FUNCTION OUTPUT 
    glm_list <- lapply(Data[3:426], function(col) 
                       tryCatch(proc_glm(col), error = function(e) e))
    

    For building a data frame, adjust method and lapply call followed with a do.call + rbind:

    proc_glm <- function(col){
      # BUILD FORMULA BY STRING
      univariate <- glm(as.formula(paste("y ~", col)), family = binomial, data = Data)
    
      # RETURN DATA FRAME OF COLUMN AND ESTIMATES
      cbind.data.frame(COL = col,
                       exp(cbind(OR = coef(univariate), confint(univariate)))
      )
    }
    
    # BUILD LIST OF DFs, PASSING COLUMN NAMES
    glm_list <- lapply(names(Data)[3:426], 
                       tryCatch(proc_glm(col), error = function(e) NA))
    
    # APPEND ALL DFs FOR SINGLE MASTER DF
    final_df <- do.call(rbind, glm_list)