Search code examples
rlogistic-regressionr-marginaleffects

Is {marginaleffects} risk difference the same as change in probability from LPM?


Let's say I am interested in predicting the probability of surviving after the Titanic sunk across different social classes. Since this isn't an RCT, the groups are unlikely to be balanced, so I also want to account for any differences in surviving that arise from gender and age. I ran a logit model, but I find communicating odds ratios difficult whilst, risk differences and risk ratios are a little easier for laypeople to understand. My questions are:

  1. Is the risk difference the same as the difference in probability, and in turn, approximately the same as the coefficients (i.e. change in probability) from an Linear Probability Model?
  2. If so, how does marginaleffects calculate average difference. Is it 'assuming' a person (e.g. they're Male, they're a specific age)? Is it taking a mean (e.g. mean(Age) ?) and using the predict() function? How is it calculating the probability associated with 'PClass' specifically rather than the other variables?
library(tidyverse)
## Load Titanic library to get the dataset
library(titanic)
library(marginaleffects)

## Load the datasets
data("titanic_train")
data("titanic_test")

## Setting Survived column for test data to NA
titanic_test$Survived <- NA

## Combining Training and Testing dataset
complete_data <- rbind(titanic_train, titanic_test) %>% 
  filter(!Survived == "Unknown") %>% 
  mutate(Pclass = factor(Pclass),
         Sex = factor(Sex),
         Survived = factor(Survived)) 


lrm_unadj <- glm(Survived ~ Pclass,
                 data = complete_data, family = binomial(link = "logit"))

lrm_adj <- glm(Survived ~ Pclass + Sex + Age,
               data = complete_data, family = binomial(link = "logit"))


# Risk difference (difference in probabilities) - should this be similar to LPM?
avg_comparisons(lrm_unadj, variables = "Pclass")
avg_comparisons(lrm_adj, variables = "Pclass") 

lm(as.numeric(Survived) ~  Pclass, data = complete_data)%>% 
lmtest::coeftest(., vcov = sandwich::vcovHC, type = "HC3") %>% 
  broom::tidy(conf.int = T)

lm(as.numeric(Survived) ~  Pclass + Sex + Age, data = complete_data)%>% 
lmtest::coeftest(., vcov = sandwich::vcovHC, type = "HC3") %>% 
  broom::tidy(conf.int = T)


Solution

  • I strongly recommend you read this page, which explains all of this in a lot of details, using the same data: https://marginaleffects.com/vignettes/comparisons.html

    Is the risk difference the same as the difference in probability, and in turn, approximately the same as the coefficients (i.e. change in probability) from an Linear Probability Model?

    Yes.

    how does marginaleffects calculate...

    There are two basic approaches. First, you can do:

    1. Fix PClass to "1st" for all observation and compute the predicted probability for each row.
    2. Fix PClass to "2nd" for all observation and compute the predicted probability for each row.
    3. Take the difference between the two vectors of predicted probabilities.
    4. Take the mean of differences.

    This is what code like this would do:

    avg_comparisons(fit)
    

    The second strategy would be:

    1. Create a grid of covariate values of interest, ex: SexCode=1, Age=25.
    2. Compute the predicted probability for that hypothetical individual if PClass="1st"
    3. Compute the predicted probability for that hypothetical individual if PClass="2nd"
    4. Take the difference between the two predicted probabilities.

    This is what code like this would do:

    comparisons(fit, newdata = datagrid(SexCode=1, Age=25))
    

    Again, please refer to the documentation. This is all explained in a lot of detail, with many examples of manual computation:

    https://marginaleffects.com/vignettes/comparisons.html