Search code examples
rglm

Ordering a list of dataframes by minimum column value in R


As a preliminary to creating a multivariate logistic regression I'm doing univariate regressions and want to select the variables with p < 0.20 to include in the multivariate model. I can map the required variables to glm and get the output of the models, but am struggling to order them by rank of p-value.

This is what I have so far:

predictor1 <- c(0,1.1,2.4,3.1,4.0,5.9,4.2,3.3,2.2,1.1)
predictor2 <- as.factor(c("yes","no","no","yes","yes","no","no","yes","no","no"))
predictor3 <- as.factor(c("a", "b", "c", "c", "a", "c", "a", "a", "a", "c"))
outcome <- as.factor(c("alive","dead","alive","dead","alive","dead","alive","dead","alive","dead"))
df <- data.frame(pred1 = predictor1, pred2 = predictor2, pred3 = predictor3, outcome = outcome)
predictors <- c("pred1", "pred2", "pred3")
df %>%
    select(predictors) %>%
    map(~ glm(df$outcome ~ .x, data = df, family = "binomial"))  %>%
    #Extract odds ratio, confidence interval lower and upper bounds, and p value
    map(function (x, y) data.frame(OR = exp(coef(x)), 
        lower=exp(confint(x)[,1]), 
        upper=exp(confint(x)[,2]),
        Pval = coef(summary(x))[,4]))

This code spits out a summary of each model

$pred1
                OR   lower          upper           Pval
    (Intercept) 0.711082 0.04841674 8.521697    0.7818212
    .x          1.133085 0.52179227 2.653040    0.7465663
$pred2
                OR   lower          upper           Pval
    (Intercept) 1   0.18507173  5.40331     1
    .xyes   1   0.07220425  13.84960    1
$pred3
                OR   lower          upper           Pval
    (Intercept) 0.25    0.0127798   1.689944    0.2149978
    .xb         170179249.43 0.0000000  NA  0.9961777
    .xc         12.00   0.6908931   542.678010  0.1220957 

but with my real dataset there are dozens of predictors so I need a way to order the output. Preferably by the minimum (non-intercept) p-value in each model. Maybe the data structure I've chosen for the summary of each model isn't the best, so any suggestions on how to get the same information in a more flexible data structure would be good too.


Solution

  • Use map_dfr instead of map, filter rows with intercept then do arrange. Use tidy from broom instead of your custom function.

    library(broom)    
    df %>%
       select(predictors) %>%
       map(~ glm(df$outcome ~ .x, data = df, family = "binomial")) %>%
       map_dfr(tidy, .id='Model') %>% 
       filter(term!="(Intercept)") %>% arrange(p.value)
    
    # A tibble: 4 x 6
    Model term   estimate std.error statistic p.value
    <chr> <chr>     <dbl>     <dbl>     <dbl>   <dbl>
    1 pred3 .xc    2.48e+ 0     1.61   1.55e+ 0   0.122
    2 pred1 .x     1.25e- 1     0.387  3.23e- 1   0.747
    3 pred3 .xb    1.90e+ 1  3956.     4.79e- 3   0.996
    4 pred2 .xyes -5.73e-16     1.29  -4.44e-16   1.000