Search code examples
rlmpredict

How can you force predictions of a linear model to start at the same point as new data


I would like my logistic regression model to start at the same point as a predictor variable.

Data:

df <- tibble(
  x = c(0:20, 0:20), 
  y = c(log(10:30 + 2), log(10:30 + 10)),
  init = c(rep(log(10 + 2), 21), rep(log(10 + 10), 21)),
  group = c(rep('A', 21), rep('B', 21))
)

Model:

lm_fit <- lm(y ~ log(x + 1) + init, data = df)

Example of model fitted to data:

newdata <- df %>%
  filter(group == 'A') %>%
  mutate(pred_y = predict(lm_fit, newdata = newdata, type = 'response')) %>%
  pivot_longer(c(y, pred_y), names_to = 'pred_type', values_to = 'value')
    ggplot(aes(x, value, colour = pred_type)) +
    geom_point() +
    geom_line()

How can I change my model so the red line (model) starts at the same value as the blue line (data)? i.e. when x=0, pred_y = y.

enter image description here


Solution

  • Using your init variable, you have to treat it as an offset (its coefficient will be 1) and disable the intercept (-1 in model formula).

    lm_fit <- lm(y ~ log(x + 1) + offset(init) - 1, data = df)
    

    plot

    After changing the model formula to log(y) ~ log(x + 1) a possible approach is to transform the y variable and use its new value in x = 0 for the offset (init) variable (I would actually recommend to always derive the offset from the y variable and not compute it independently). This way only the data is modified and the rest will remain the same.

    df <- df %>% 
      group_by(group) %>%
      mutate(y = log(y),
             init = y[x==0])
    

    plot