I have data (below), and have carried out linear, ridge, and lasso regression. For lasso and ridge regression I have found the optimal lambda using cross validation. I now want to superimpose the fitted models onto a y vs x plot of my original data. I have the linear model on the graph, I just can't figure out how to get the other two to appear. I have attempted it in ggplot, but an answer in base R would be really helpful too! Even if you could point me in the right direction, that would be great.
I have the models all working fine. I have the linear regression line on the plot. However when I try to plot the other two fits in the same way it won't work.
set.seed(1)
x <- rnorm(100)
y <- 1 + .2*x+3*x^2+.6*x^3 + rnorm(100)
d <- data.frame(x=x,y=y)
d$x2 <- d$x^2
d$x3 <- d$x^3
d$x4 <-d$x^4
d$x5 <-d$x^5
f <- lm(y ~ ., data=d)
library(glmnet)
x <- model.matrix(y ~ ., data=d)
y <- d$y
grid <- 0.001:50
ridge.fit <- glmnet(x,y,alpha=0, lambda = grid)
cv <- cv.glmnet(x,y)
r.fit.new <- glmnet(x,y,alpha=0, lambda = cv$lambda.min)
lasso.fit <- glmnet(x,y,alpha=1, lambda = grid)
l.fit.new <- glmnet(x,y,alpha=1, lambda = cv$lambda.min)
ggplot(data=d, aes(x=x, y=y)) + geom_point() + geom_line(aes(y=fitted(f)), colour="blue")
changed your code for creating the data a bit
set.seed(1)
x <- rnorm(100)
y <- 1 + .2*x+3*x^2+.6*x^3 + rnorm(100)
d <- data.frame(x.values=x,y=y)
d$x2 <- d$x.values^2
d$x3 <- d$x.values^3
d$x4 <-d$x.values^4
d$x5 <-d$x.values^5
the rest of your code for creating the model matrix and doing the models as it is.
Some munging to format the data for plotting
library(dplyr)
data.for.plot <- d%>%
select(x.values,y) %>%
mutate(fitted_lm = as.numeric(fitted(f)),
fitted_ridge_lm = as.numeric(predict(r.fit.new, newx= x)),
fitted_lasso_lm = as.numeric(predict(l.fit.new, newx= x)))
#Plot
ggplot(data.for.plot, aes(x = x.values, y = y)) +
geom_point() +
geom_line(aes(y=fitted_lm), colour="blue") +
geom_line(aes(y=fitted_ridge_lm), colour="red") +
geom_line(aes(y= fitted_lasso_lm),color="grey75") + theme_bw()
Now you will notice it's hard to see the fits as they are pretty close to each other (great the models agree). So let's format the data a bit and use faceting in ggplot to see the fits individually
library(tidyr)
data.for.plot.long <- gather(data.for.plot, key= fit_type, value = fits, -x.values,-y)
ggplot(data.for.plot.long, aes(y = y, x = x.values)) +
geom_point() +
geom_line(aes(y = fits,colour=fit_type))+facet_wrap(~fit_type, ncol = 1,scales = "free") + theme_bw()