I have a loop in my code that pulls summary statistics from each logistic regression I perform on each dataset. I would like it to also show the count of each level for each variable if possible. I am pretty new to loops though, and am not quite sure how to ask R how to do it.
Here is an example of the code I am currently using. The data I am using is different but the rest of the analysis is nearly the same.
library(tidyverse)
install.packages("AER")
library("AER")
data(Affairs, package="AER")
Affairs$ynaffair[Affairs$affairs > 0] <- 1
Affairs$ynaffair[Affairs$affairs == 0] <- 0
Affairs <- Affairs %>%
mutate_at(c("affairs", "religiousness", "occupation", "rating", "ynaffair"), as.factor)
frmlas <- list(ynaffair~gender,
ynaffair~children,
ynaffair~religiousness,
ynaffair~occupation,
ynaffair~rating)
output <- list()
output_df_list <- list()
for(i in 1:length(frmlas)){
output[[i]] <- glm(frmlas[[i]], data = Affairs, family=binomial)
names(output)[i] <- capture.output(frmlas[[i]])
output_df_list[[i]] <- data.frame("Model"=capture.output(frmlas[[i]]),
"Covariate"=names(output[[i]]$coefficients),
"Beta_Estimate"=output[[i]]$coefficients,
"P_val"=summary(output[[i]])$coefficients[,4],
"CI_95_LL"=confint(output[[i]])[,1],
"CI_95_UL"=confint.default(output[[i]])[,2])
rownames(output_df_list[[i]]) <- NULL
}
output_df_full_t1 <- do.call("rbind", output_df_list)
output_df_full_t1 <- output_df_full_t1 %>%
mutate(OR = exp(Beta_Estimate)) %>%
mutate(CI_95_LL = exp(CI_95_LL),
CI_95_UL = exp(CI_95_UL)) %>%
select(Model, Covariate, OR, CI_95_LL, CI_95_UL, P_val) %>%
filter(!(Covariate %in% '(Intercept)')) %>%
mutate_if(is.numeric, round, digits = 3) %>%
unite(CI, c(CI_95_LL, CI_95_UL), sep = ", ", remove = TRUE)
head(output_df_full_t1)
This is where I found this piece of code, and then modified it slightly: https://bookdown.org/kdonovan125/ibis_data_analysis_r4/intro-7.html
The expected result would be a new column with the counts of each level in the output. Or if that isn't possible, and you have an alternative suggestion, I would be grateful for the advice. My mentor/boss just doesn't want me copy/ pasting the counts for each one. The end goal is to output the result to an excel file.
Desired final output:
I need to do this many times, so figuring this out would be very time-saving for me.
Thanks!
The output from glm includes the data used in the model. So you can table
that in your loop and include the results in the list of data.frames. The results are the frequency counts for each level of the factor. The column names in the data frames are "n.Var1" and "n.Freq", which you then include in your select
statement. However, the reference groups will be omitted if you filter out the rows containing (Intercepts)
, so perhaps omit that line.
for(i in 1:length(frmlas)){
output[[i]] <- glm(frmlas[[i]], data = Affairs, family=binomial)
names(output)[i] <- capture.output(frmlas[[i]])
output_df_list[[i]] <- data.frame("Model"=capture.output(frmlas[[i]]),
"Covariate"=names(output[[i]]$coefficients),
"Beta_Estimate"=output[[i]]$coefficients,
"P_val"=summary(output[[i]])$coefficients[,4],
"CI_95_LL"=confint(output[[i]])[,1],
"CI_95_UL"=confint.default(output[[i]])[,2],
"n"=table(output[[i]]$model[,2])) # ADD
rownames(output_df_list[[i]]) <- NULL
}
output_df_full_t1 <- do.call("rbind", output_df_list)
output_df_full_t1 <- output_df_full_t1 %>%
mutate(OR = exp(Beta_Estimate)) %>%
mutate(CI_95_LL = exp(CI_95_LL),
CI_95_UL = exp(CI_95_UL)) %>%
select(Model, Covariate, OR, CI_95_LL, CI_95_UL, P_val, n.Var1, n.Freq) %>%
# filter(!(Covariate %in% '(Intercept)')) %>%
mutate_if(is.numeric, round, digits = 3) %>%
unite(CI, c(CI_95_LL, CI_95_UL), sep = ", ", remove = TRUE)
head(output_df_full_t1)
Model Covariate OR CI P_val n.Var1 n.Freq
1 ynaffair ~ gender (Intercept) 0.296 0.226, 0.385 0.000 female 315
2 ynaffair ~ gender gendermale 1.266 0.874, 1.832 0.212 male 286
3 ynaffair ~ children (Intercept) 0.188 0.122, 0.283 0.000 no 171
4 ynaffair ~ children childrenyes 2.137 1.365, 3.389 0.001 yes 430
5 ynaffair ~ religiousness (Intercept) 0.714 0.397, 1.268 0.250 1 48
6 ynaffair ~ religiousness religiousness2 0.467 0.238, 0.916 0.027 2 164
If you don't want the rows containing (Intercept)
then you will have to think of another way to include the counts of the reference groups in your output table.