Search code examples
rlinear-regressionpredict

Predict function producing fewer rows than provided


I am currently trying to run the following code:

pv_model <- glm(SalePrice ~ MSSubClass + MSZoning..., data = train)
summary(pv_model)
pv_predict <- predict(pv_model)
train$PV <- pv_predict

However, when I try to assign the predictions as a column in the train data set, I get this error:

Error: Assigned data `predict(pv_model)` must be compatible with existing data.
x Existing data has 730 rows.
x Assigned data has 540 rows.
i Only vectors of size 1 are recycled.

Upon further inspection, it looks like my pv_predict variable only contains 540 rows, despite pv_model having 730. What accounts for this difference? Why does the predict function eliminate so many rows, and what can I do to fix this?

Any help is appreciated.


Solution

  • Missing data in the training set might be the issue. Try:

    predict(pv_model, newdata=train)
    

    This will use all the rows, and give you NA where there is missing data in a predictor.