I would like my logistic regression model to start at the same point as a predictor variable.
Data:
df <- tibble(
x = c(0:20, 0:20),
y = c(log(10:30 + 2), log(10:30 + 10)),
init = c(rep(log(10 + 2), 21), rep(log(10 + 10), 21)),
group = c(rep('A', 21), rep('B', 21))
)
Model:
lm_fit <- lm(y ~ log(x + 1) + init, data = df)
Example of model fitted to data:
newdata <- df %>%
filter(group == 'A') %>%
mutate(pred_y = predict(lm_fit, newdata = newdata, type = 'response')) %>%
pivot_longer(c(y, pred_y), names_to = 'pred_type', values_to = 'value')
ggplot(aes(x, value, colour = pred_type)) +
geom_point() +
geom_line()
How can I change my model so the red line (model) starts at the same value as the blue line (data)? i.e. when x=0
, pred_y = y
.
Using your init
variable, you have to treat it as an offset (its coefficient will be 1) and disable the intercept (-1
in model formula).
lm_fit <- lm(y ~ log(x + 1) + offset(init) - 1, data = df)
After changing the model formula to log(y) ~ log(x + 1)
a possible approach is to transform the y
variable and use its new value in x = 0 for the offset (init
) variable (I would actually recommend to always derive the offset from the y variable and not compute it independently). This way only the data is modified and the rest will remain the same.
df <- df %>%
group_by(group) %>%
mutate(y = log(y),
init = y[x==0])