Search code examples
rpredictionpredictlda

Getting Warning: «'newdata' had 150 rows but variables found have 350 rows» on LDA Predict in R


I am trying to run the predict function for a LDA model. I have two predictors x1 and x2 and a categorical response y that takes values of -1 and 1. All parameters contains 500 datapoints. And I am splitting the dataset as follows:

xx = data.frame(cbind(x1,x2))
x = cbind(x1,x2)
x_train = x[1:350,]
x_test = x[351:N,]
y_train = y[1:350]
y_test = y[351:N]

Some output:

          x1        x2  y
1 -1.1843924  1.920765 -1
2  3.3167508  2.321631  1
3 -3.0301378  5.973256 -1
4 -1.3262624 -2.320463 -1
5 -0.6534166 -3.050822 -1
6 -2.0051728 -4.118190 -1

Then I fit the LDA model and try the predict function:

modelo.lda = lda(y_train~xx[1:350,1]+xx[1:350,2])
predict.lda = predict(modelo.lda, newdata=xx[351:N,])

Note: the xx values are stated in that way following this answer for the same problem.

But there is where I get:

Warning message: 'newdata' had 150 rows but variables found have 350 rows

I thought that mantaining the same xx[init:end,] form fixed the problem as the answer of this question stated but it seems it doesn't.

What could it be?

Thanks in advance.


Solution

  • As suggestion if you have train and test sets, it is better if you use them in this way so that you can avoid potential pitfalls. Try this:

    library(MASS)
    #Data
    N <- 500
    x1 <- rnorm(N,0,1)
    x2 <- rnorm(N,1,5)
    y <- round(runif(N,0,1),0)
    xx = data.frame(x1,x2,y)
    x_train = xx[1:350,]
    x_test = xx[351:N,]
    #Models
    modelo.lda = lda(y_train~x1+x2,data = x_train)
    predict.lda = predict(modelo.lda, newdata=x_test)
    

    No warnings will we produced.