Search code examples
rplotprobabilitylogistic-regressionordinal

How to plot the predicted probabilities for an ordered logit regression?


I want to plot a similar plot as this one in the buttom of the page: ordered logit

They use a variabel on the x-axis that is categorical (0-10) and therefore they use seq(0, 10, 1) and hold all other variables constant at their means when they determine the predicted probabilities for the plot.

However in my data the variabel I want to have on the x-axis is a dummy (0-1) and therefore seq() does not work. What can I do instead to do a similar plot? Do you have any exapmples.


Solution

  • It's difficult to walk you through your own code in the absence of a reproducible example, but let's create one with a dummy variable on the x axis.

    Here, we have a binary outcome variable, a dummy predictor variable, and a numeric predictor variable that we want to hold at its mean value for the purposes of the plot:

    set.seed(1)
    
    df <- data.frame(dummy = rep(0:1, each = 20),
                     other_var = runif(40),
                     outcome = rbinom(40, 1, rep(c(0.3, 0.7), each = 20)))
    

    We can create a logistic regression like this:

    model <- glm(outcome ~ dummy + other_var, family = binomial, data = df)
    
    summary(model)
    #> 
    #> Call:
    #> glm(formula = outcome ~ dummy + other_var, family = binomial, 
    #>     data = df)
    #> 
    #> Deviance Residuals: 
    #>     Min       1Q   Median       3Q      Max  
    #> -1.4942  -0.9675  -0.5557   0.9465   2.0245  
    #> 
    #> Coefficients:
    #>              Estimate Std. Error z value Pr(>|z|)  
    #> (Intercept) -0.006877   0.831579  -0.008   0.9934  
    #> dummy        1.447983   0.713932   2.028   0.0425 *
    #> other_var   -2.120011   1.339460  -1.583   0.1135  
    #> ---
    #> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    #> 
    #> (Dispersion parameter for binomial family taken to be 1)
    #> 
    #>     Null deviance: 54.548  on 39  degrees of freedom
    #> Residual deviance: 46.726  on 37  degrees of freedom
    #> AIC: 52.726
    #> 
    #> Number of Fisher Scoring iterations: 4
    

    To predict the outcome for each value of dummy at the mean of other_var, we create a little prediction data frame with one column holding each unique value of dummy and another column filled with the mean value of other_var:

    pred_df <- data.frame(dummy = 0:1, other_var = mean(df$other_var))
    

    Now we feed this to predict

    preds <- predict(model, pred_df, se.fit = TRUE)
    

    Since we are dealing with log odds here, we need a little function to convert log odds to probabilities:

    log_odds_to_probs <- function(x) exp(x) / (1 + exp(x))
    

    Now we can get our predicted values as well as the 95% confidence intervals for the predictions as probabilities:

    pred_df$fit   <- log_odds_to_probs(preds$fit)
    pred_df$lower <- log_odds_to_probs(preds$fit - 1.96 * preds$se.fit)
    pred_df$upper <- log_odds_to_probs(preds$fit + 1.96 * preds$se.fit)
    

    Finally, we plot the result using whatever style you like:

    library(ggplot2)
    
    ggplot(pred_df, aes(factor(dummy), fit)) +
      geom_errorbar(aes(ymin = lower, ymax = upper), width = 0.2, size = 1.5,
                    color = 'deepskyblue4') +
      geom_point(size = 4) +
      labs(x = 'dummy', y = 'probability') +
      ylim(c(0, 1)) +
      theme_minimal(base_size = 16)
    

    enter image description here

    Created on 2022-12-06 with reprex v2.0.2