Search code examples
rregressionglmnetlasso-regressionregularized

Why does the number of observations reduce using model.matrix in ridge regression?


I'm using glmnet package in R for ridge regression. I tried on Hitters dataset from ISLR package. The problem is, when I use model.matrix to create the design matrix, the number of observations reduced for unknown reason. This is the code.

library(ISLR)
library(glmnet)

data("Hitters")

set.seed(1)
train=sample(1:nrow(Hitters), nrow(Hitters)/2)
test=(-train)

train.data = Hitters[train,]
test.data = Hitters[test,]
train.x=model.matrix(Salary~.,train.data)[,-1]
train.y=train.data$Salary

In the code, I'm trying to predict salary variable using all other variables. The train.data has 161 observations while train.x has 131. I don't understand why that would occur and would appreciate any help.


Solution

  • You have NA values in the Salary field.

    You can identify the problem like this:

    missing.players <- setdiff(rownames(train.data), rownames(train.x))
    train.data[missing.players, ]