Search code examples
rloops

Getting count from a loop


I have a loop in my code that pulls summary statistics from each logistic regression I perform on each dataset. I would like it to also show the count of each level for each variable if possible. I am pretty new to loops though, and am not quite sure how to ask R how to do it.

Here is an example of the code I am currently using. The data I am using is different but the rest of the analysis is nearly the same.

library(tidyverse)
install.packages("AER")
library("AER")
data(Affairs, package="AER")
Affairs$ynaffair[Affairs$affairs >  0] <- 1
Affairs$ynaffair[Affairs$affairs == 0] <- 0

Affairs <- Affairs %>% 
  mutate_at(c("affairs", "religiousness", "occupation", "rating", "ynaffair"), as.factor) 

frmlas <- list(ynaffair~gender,
               ynaffair~children,
               ynaffair~religiousness,
               ynaffair~occupation,
               ynaffair~rating)
output <- list() 
output_df_list <- list() 

for(i in 1:length(frmlas)){
  output[[i]] <- glm(frmlas[[i]], data = Affairs, family=binomial)
  names(output)[i] <- capture.output(frmlas[[i]]) 
  output_df_list[[i]] <- data.frame("Model"=capture.output(frmlas[[i]]),
                                    "Covariate"=names(output[[i]]$coefficients),
                                    "Beta_Estimate"=output[[i]]$coefficients,
                                    "P_val"=summary(output[[i]])$coefficients[,4],
                                    "CI_95_LL"=confint(output[[i]])[,1],
                                    "CI_95_UL"=confint.default(output[[i]])[,2])
    rownames(output_df_list[[i]]) <- NULL             
}  

output_df_full_t1 <- do.call("rbind", output_df_list)

output_df_full_t1 <- output_df_full_t1 %>% 
  mutate(OR = exp(Beta_Estimate)) %>% 
  mutate(CI_95_LL = exp(CI_95_LL),
         CI_95_UL = exp(CI_95_UL))  %>% 
  select(Model, Covariate, OR, CI_95_LL, CI_95_UL, P_val) %>% 
  filter(!(Covariate %in% '(Intercept)'))  %>% 
  mutate_if(is.numeric, round, digits = 3)   %>% 
  unite(CI, c(CI_95_LL, CI_95_UL), sep = ", ", remove = TRUE) 

head(output_df_full_t1)

enter image description here

This is where I found this piece of code, and then modified it slightly: https://bookdown.org/kdonovan125/ibis_data_analysis_r4/intro-7.html

The expected result would be a new column with the counts of each level in the output. Or if that isn't possible, and you have an alternative suggestion, I would be grateful for the advice. My mentor/boss just doesn't want me copy/ pasting the counts for each one. The end goal is to output the result to an excel file.

Desired final output:

enter image description here

I need to do this many times, so figuring this out would be very time-saving for me.

Thanks!


Solution

  • The output from glm includes the data used in the model. So you can table that in your loop and include the results in the list of data.frames. The results are the frequency counts for each level of the factor. The column names in the data frames are "n.Var1" and "n.Freq", which you then include in your select statement. However, the reference groups will be omitted if you filter out the rows containing (Intercepts), so perhaps omit that line.

    for(i in 1:length(frmlas)){
      output[[i]] <- glm(frmlas[[i]], data = Affairs, family=binomial)
      names(output)[i] <- capture.output(frmlas[[i]]) 
      output_df_list[[i]] <- data.frame("Model"=capture.output(frmlas[[i]]),
                                        "Covariate"=names(output[[i]]$coefficients),
                                        "Beta_Estimate"=output[[i]]$coefficients,
                                        "P_val"=summary(output[[i]])$coefficients[,4],
                                        "CI_95_LL"=confint(output[[i]])[,1],
                                        "CI_95_UL"=confint.default(output[[i]])[,2],
                                        "n"=table(output[[i]]$model[,2])) # ADD
      rownames(output_df_list[[i]]) <- NULL             
    }  
    
    output_df_full_t1 <- do.call("rbind", output_df_list)
    
    output_df_full_t1 <- output_df_full_t1 %>% 
      mutate(OR = exp(Beta_Estimate)) %>% 
      mutate(CI_95_LL = exp(CI_95_LL),
             CI_95_UL = exp(CI_95_UL))  %>% 
      select(Model, Covariate, OR, CI_95_LL, CI_95_UL, P_val, n.Var1, n.Freq) %>% 
      # filter(!(Covariate %in% '(Intercept)'))  %>% 
      mutate_if(is.numeric, round, digits = 3)   %>% 
      unite(CI, c(CI_95_LL, CI_95_UL), sep = ", ", remove = TRUE) 
    
    head(output_df_full_t1)
    

                         Model      Covariate    OR           CI P_val n.Var1 n.Freq
    1        ynaffair ~ gender    (Intercept) 0.296 0.226, 0.385 0.000 female    315
    2        ynaffair ~ gender     gendermale 1.266 0.874, 1.832 0.212   male    286
    3      ynaffair ~ children    (Intercept) 0.188 0.122, 0.283 0.000     no    171
    4      ynaffair ~ children    childrenyes 2.137 1.365, 3.389 0.001    yes    430
    5 ynaffair ~ religiousness    (Intercept) 0.714 0.397, 1.268 0.250      1     48
    6 ynaffair ~ religiousness religiousness2 0.467 0.238, 0.916 0.027      2    164
    

    If you don't want the rows containing (Intercept) then you will have to think of another way to include the counts of the reference groups in your output table.