In glmnet() I have to specify the raw X matrix and response vector Y (different than lm where you can specify the model formula). model.matrix() will correctly remove incomplete observations from the X matrix, but it doesn't include the response in the output object. So I will have something like this:
mydf
glmnet(y = mydf$response, x = model.matrix(myformula, mydf)[,-1], ...)
When model.matrix removes observations the y and x dimensions won't match. Is there a function to align y data to x?
Try using model.frame
and model.response
.
> d <- data.frame(y=rnorm(3), x=c(1,NA,2), z=c(NA, NA, 1))
> d
y x z
1 -0.6257260 1 NA
2 -0.4979723 NA NA
3 -1.2233772 2 1
> form <- y~x
> mf <- model.frame(form, data=d)
> model.response(mf)
1 3
-0.625726 -1.223377
> model.matrix(form, mf)
(Intercept) x
1 1 1
3 1 2
attr(,"assign")
[1] 0 1
I'm not familiar with glmnet
, it might be the case that mf
is sufficient, just passing y=mf[1,]
and x=mf[-1,]
.