Search code examples
rggplot2survival-analysis

R: Plot predicted and actual response over a time variable


I have a model where I would like to plot the prediction over a time variable. Adding the average response for that time point in the same graph would also be very helpful.

Here is some reproducible data.

set.seed(123)
x1 = rnorm(1000)           # some continuous variables 
x2 = rnorm(1000)
z = 1 + 2*x1 + 3*x2        # linear combination with a bias
pr = 1/(1+exp(-z))         # pass through an inv-logit function
 y = rbinom(1000,1,pr)      # bernoulli response variable

#valid glm:
df = data.frame(y=y,x1=x1,x2=x2,time=rep(seq(1:10),10))
fit = glm( y~x1+x2,data=df,family="binomial")

Now I would like to plot mean(predict(fit,df,type="response")) by the group time as well as mean(y) by the group time.

Any hints or ideas?

EDIT: Thank you for the responses! Yes I am aware that in this example time is not in the model. I just wanted to make an easy example. In my real model, time is included. And yes I want to plot the mean response and mean prediction over time.


Solution

  • This seems like an odd thing to do, to me, without including time in the model. Below is one approach, calculating the mean of x and the mean of the prediction.

    library(tidyverse)
    
    df$pred <- predict(fit)
    
    means <- df %>% 
      group_by(time) %>% 
      summarize(mean_y = mean(y),
                mean_pred = mean(pred)) %>% 
      gather(mean, val, -time)
    
    ggplot(means, aes(time, val, color = mean)) +
      geom_point() +
      geom_line()
    

    enter image description here