As a preliminary to creating a multivariate logistic regression I'm doing univariate regressions and want to select the variables with p < 0.20 to include in the multivariate model. I can map the required variables to glm
and get the output of the models, but am struggling to order them by rank of p-value.
This is what I have so far:
predictor1 <- c(0,1.1,2.4,3.1,4.0,5.9,4.2,3.3,2.2,1.1)
predictor2 <- as.factor(c("yes","no","no","yes","yes","no","no","yes","no","no"))
predictor3 <- as.factor(c("a", "b", "c", "c", "a", "c", "a", "a", "a", "c"))
outcome <- as.factor(c("alive","dead","alive","dead","alive","dead","alive","dead","alive","dead"))
df <- data.frame(pred1 = predictor1, pred2 = predictor2, pred3 = predictor3, outcome = outcome)
predictors <- c("pred1", "pred2", "pred3")
df %>%
select(predictors) %>%
map(~ glm(df$outcome ~ .x, data = df, family = "binomial")) %>%
#Extract odds ratio, confidence interval lower and upper bounds, and p value
map(function (x, y) data.frame(OR = exp(coef(x)),
lower=exp(confint(x)[,1]),
upper=exp(confint(x)[,2]),
Pval = coef(summary(x))[,4]))
This code spits out a summary of each model
$pred1
OR lower upper Pval
(Intercept) 0.711082 0.04841674 8.521697 0.7818212
.x 1.133085 0.52179227 2.653040 0.7465663
$pred2
OR lower upper Pval
(Intercept) 1 0.18507173 5.40331 1
.xyes 1 0.07220425 13.84960 1
$pred3
OR lower upper Pval
(Intercept) 0.25 0.0127798 1.689944 0.2149978
.xb 170179249.43 0.0000000 NA 0.9961777
.xc 12.00 0.6908931 542.678010 0.1220957
but with my real dataset there are dozens of predictors so I need a way to order the output. Preferably by the minimum (non-intercept) p-value in each model. Maybe the data structure I've chosen for the summary of each model isn't the best, so any suggestions on how to get the same information in a more flexible data structure would be good too.
Use map_dfr
instead of map
, filter rows with intercept then do arrange
. Use tidy
from broom
instead of your custom function.
library(broom)
df %>%
select(predictors) %>%
map(~ glm(df$outcome ~ .x, data = df, family = "binomial")) %>%
map_dfr(tidy, .id='Model') %>%
filter(term!="(Intercept)") %>% arrange(p.value)
# A tibble: 4 x 6
Model term estimate std.error statistic p.value
<chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 pred3 .xc 2.48e+ 0 1.61 1.55e+ 0 0.122
2 pred1 .x 1.25e- 1 0.387 3.23e- 1 0.747
3 pred3 .xb 1.90e+ 1 3956. 4.79e- 3 0.996
4 pred2 .xyes -5.73e-16 1.29 -4.44e-16 1.000