Search code examples
rglmpredictmfp

Find value of covariate given a probability in R


Given a fractional polynomial GLM, I am looking to find the value of a covariate that gives me an output of a given probability.

My data is simulated using:

# FUNCTIONS ====================================================================

logit <- function(p){
  x = log(p/(1-p))
  x
}

sigmoid <- function(x){
  p = 1/(1 + exp(-x))
  p
}

beta_duration <- function(D, select){
  logit(
    switch(select,
           0.05 + 0.9 / (1 + exp(-2*D + 25)),
           0.9 * exp(-exp(-0.5 * (D - 11))),
           0.9 * exp(-exp(-(D - 11))),
           0.9 * exp(-2 * exp(-(D - 9))),
           sigmoid(0.847 + 0.210 * (D - 10)),
           0.7 + 0.0015 * (D - 10) ^ 2,
           0.7 - 0.0015 * (D - 10) ^ 2 + 0.03 * (D - 10)
    )
  )
}

beta_sex <- function(sex, OR = 1){
  ifelse(sex == "Female", -0.5 * log(OR), 0.5 * log(OR))
}

plot_beta_duration <- function(select){
  x <- seq(10, 20, by = 0.01)
  y <- beta_duration(x, select)
  data.frame(x = x,
             y = y) %>%
    ggplot(aes(x = x, y = y)) +
    geom_line() +
    ylim(0, 1)
}


# DATA SIMULATION ==============================================================

duration <- c(10, 12, 14, 18, 20)
sex <- factor(c("Female", "Male"))

eta <- function(duration, sex, duration_select, sex_OR, noise_sd){
  beta_sex(sex, sex_OR) + beta_duration(duration, duration_select) + rnorm(length(duration), 0, noise_sd)
}

sim_data <- function(durations_type, sex_OR, noise_sd, p_female, n, seed){
  set.seed(seed)
  data.frame(
    duration = sample(duration, n, TRUE),
    sex = sample(sex, n, TRUE, c(p_female, 1 - p_female))
  ) %>%
    rowwise() %>%
    mutate(eta = eta(duration, sex, durations_type, sex_OR, noise_sd),
           p = sigmoid(eta),
           cured = sample(0:1, 1, prob = c(1 - p, p)))
}

# DATA SIM PARAMETERS
durations_type <- 4 # See beta_duration for functions
sex_OR <- 3 # Odds of cure for male vs female (ref)
noise_sd <- 1
p_female <- 0.7 # proportion of females in the sample
n <- 500 

data <- sim_data(durations_type = 1, # See beta_duration for functions
                 sex_OR = 3, # Odds of cure for male vs female (ref)
                 noise_sd = 1,
                 p_female = 0.7, # proportion of females in the sample
                 n = 500,
                 seed = 21874564)

And my model is fitted by:

library(mfp)

model1 <- mfp(cured ~ fp(duration) + sex,
              family = binomial(link = "logit"),
              data = data)
summary(model1)

For each level of sex (i.e. "Male" or "Female"), I want to find the value of duration that gives me a probability equal to some value frontier <- 0.8.

So far, I can only think of using an approximation using a vector of possibilities:

pred_duration <- seq(10, 20, by = 0.1)
pred <- data.frame(expand.grid(duration = pred_duration,
                       sex = sex),
           p = predict(model1, 
                       newdata = expand.grid(duration = pred_duration,
                                            sex = sex),
                       type = "response"))

pred[which(pred$p > 0.8), ] %>%
  group_by(sex) %>%
  summarize(min(duration))

But I am really after an exact solution.


Solution

  • The function uniroot allows you to detect the point at which the output of a function equals 0. If you create a function that takes duration as input, calculates the predicted probability from that duration, then subtracts the desired probability, then this function will have an output of 0 at the desired value of duration. uniroot will find this value for you. If you wrap this process in a little function, it makes it very easy to use:

    find_prob <- function(p) {
    
      f <- function(v) {
        predict(model1, type = 'response',
                newdata = data.frame(duration = v, sex = 'Male')) - p
      }
      uniroot(f, interval = range(data$duration), tol = 1e-9)$root
    }
    

    So, for example, to find the duration that gives an 80% probability, we just do:

    find_prob(0.8)
    #> [1] 12.86089
    

    To prove that this is the correct value, we can feed it directly into predict to see what the predicted probability will be given sex = male and duration = 12.86089

    predict(model1, type = 'response',
            newdata = data.frame(sex = 'Male', duration = find_prob(0.8)))
    #>   1 
    #> 0.8