I have a real data set that consists of 40 observations. My fully specified model is a multiple linear regression model. Now, I am wondering how can I simulate many y values from this model corresponding to a single x value. Of course, I know how to predict a y value using lm command on R, but how to get multiple y values is the question. Any hint is really appreciated.
Thanks
I think the only way you would be able to do that is to keep your first predictor (X1) constant while allowing your other variables to vary in the model. Otherwise you will simply get the same y value every time if you don't manipulate anything else. As an example, here I fit the iris
dataset into a multiple linear regression. I then created new data with Petal.Width
held at the same value and Sepal.Width
changing for each entry of the first predictor. Then I ran a prediction based off that data.
#### Library ####
library(tidyverse)
#### Fit MLR ####
fit <- lm(Petal.Length ~ Petal.Width + Sepal.Width,
data = iris)
#### Add Constant X1 and Changing X2 ####
new.data <- data.frame(Petal.Width = c(1.1,1.1,1.1,1.1,1.1),
Sepal.Width = c(3.2,3.5,3.4,2.7,2.5))
#### Predict Values of Y ####
pred <- predict(fit,
newdata = new.data)
pred.df <- data.frame(pred)
#### Plot Predicted Values ####
pred.df %>%
ggplot(aes(x=1:5,
y=pred,
label=Sepal.Width))+
geom_point()+
geom_line()+
geom_label()+
labs(x="Simulated Observation Number",
y="Predicted Value of Petal Length",
title = "Five Simulations of 1.1 Petal Width")
Which gives you this new data, labeled by the manipulated X2 values:
Of course you could also randomly sample your data filtered by your constant value and plot that, but you would need to have enough samples that have different X1-X2 values. Here I changed my constant X1 to 1.5 and sampled 10 times:
#### Randomly Sample Draws ####
sample.data <- iris %>%
filter(Petal.Width == 1.5) %>%
slice_sample(n=10)
pred2 <- predict(fit,
newdata = sample.data)
pred.df2 <- data.frame(pred2,
sample.data)
#### Plot Predicted Values ####
pred.df2 %>%
ggplot(aes(x=1:10,
y=pred2,
label=Sepal.Width))+
geom_point()+
geom_line()+
geom_label()+
labs(x="Simulated Observation Number",
y="Predicted Value of Petal Length",
title = "10 Random Samples of 1.5 Petal Width")
Lastly, you could totally randomize the X2 value and then run that in your predictions to see what they would look like (would possibly be more generalizable as its not fitting just to the data you have). You can achieve this with the rnorm
function if it is a normally distributed variable and other methods like it for other distributions. I forgot to set a seed for this, but the results should look similar.
#### Completely Random with Any Value ####
most.random <- data.frame(Petal.Width = rep(1.5,
100),
Sepal.Width = round(rnorm(n=100,
mean=2,
sd=1),
2))
#### Predict ####
random.pred <- predict(fit,
most.random)
#### Fit to Data Frame ####
random.data <- data.frame(random.pred,
most.random)
#### Plot ####
random.data %>%
ggplot(aes(x=1:100,
y=random.pred,
label=Sepal.Width))+
geom_point()+
geom_line()+
geom_label(size=1)+
labs(x="Simulated Observation Number",
y="Predicted Value of Petal Length",
title = "100 Totally Random Samples of 1.5 Petal Width")