Search code examples
rlinear-regressionpredictionlm

Simulate many y values from a linear model corresponding to a single x value


I have a real data set that consists of 40 observations. My fully specified model is a multiple linear regression model. Now, I am wondering how can I simulate many y values from this model corresponding to a single x value. Of course, I know how to predict a y value using lm command on R, but how to get multiple y values is the question. Any hint is really appreciated.

Thanks


Solution

  • Manual Method

    I think the only way you would be able to do that is to keep your first predictor (X1) constant while allowing your other variables to vary in the model. Otherwise you will simply get the same y value every time if you don't manipulate anything else. As an example, here I fit the iris dataset into a multiple linear regression. I then created new data with Petal.Width held at the same value and Sepal.Width changing for each entry of the first predictor. Then I ran a prediction based off that data.

    #### Library ####
    library(tidyverse)
    
    #### Fit MLR ####
    fit <- lm(Petal.Length ~ Petal.Width + Sepal.Width,
              data = iris)
    
    #### Add Constant X1 and Changing X2 ####
    new.data <- data.frame(Petal.Width = c(1.1,1.1,1.1,1.1,1.1),
                           Sepal.Width = c(3.2,3.5,3.4,2.7,2.5))
    
    #### Predict Values of Y ####
    pred <- predict(fit,
            newdata = new.data)
    pred.df <- data.frame(pred)
    
    #### Plot Predicted Values ####
    pred.df %>% 
      ggplot(aes(x=1:5,
                 y=pred,
                 label=Sepal.Width))+
      geom_point()+
      geom_line()+
      geom_label()+
      labs(x="Simulated Observation Number",
           y="Predicted Value of Petal Length",
           title = "Five Simulations of 1.1 Petal Width")
    

    Which gives you this new data, labeled by the manipulated X2 values:

    enter image description here

    Randomly Sample from Present Data

    Of course you could also randomly sample your data filtered by your constant value and plot that, but you would need to have enough samples that have different X1-X2 values. Here I changed my constant X1 to 1.5 and sampled 10 times:

    #### Randomly Sample Draws ####
    sample.data <- iris %>% 
      filter(Petal.Width == 1.5) %>% 
      slice_sample(n=10)
    
    pred2 <- predict(fit,
            newdata = sample.data)
    
    pred.df2 <- data.frame(pred2,
                 sample.data)
    
    
    #### Plot Predicted Values ####
    pred.df2 %>% 
      ggplot(aes(x=1:10,
                 y=pred2,
                 label=Sepal.Width))+
      geom_point()+
      geom_line()+
      geom_label()+
      labs(x="Simulated Observation Number",
           y="Predicted Value of Petal Length",
           title = "10 Random Samples of 1.5 Petal Width")
    

    enter image description here

    Simulate Random Data Not in Sample

    Lastly, you could totally randomize the X2 value and then run that in your predictions to see what they would look like (would possibly be more generalizable as its not fitting just to the data you have). You can achieve this with the rnorm function if it is a normally distributed variable and other methods like it for other distributions. I forgot to set a seed for this, but the results should look similar.

    #### Completely Random with Any Value ####
    most.random <- data.frame(Petal.Width = rep(1.5,
                                                100),
                           Sepal.Width = round(rnorm(n=100,
                                               mean=2,
                                               sd=1),
                                               2))
    #### Predict ####
    random.pred <- predict(fit,
            most.random)
    
    #### Fit to Data Frame ####
    random.data <- data.frame(random.pred,
                              most.random)
    
    #### Plot ####
    random.data %>% 
      ggplot(aes(x=1:100,
                 y=random.pred,
                 label=Sepal.Width))+
      geom_point()+
      geom_line()+
      geom_label(size=1)+
      labs(x="Simulated Observation Number",
           y="Predicted Value of Petal Length",
           title = "100 Totally Random Samples of 1.5 Petal Width")
    

    enter image description here