Search code examples
rxgboosttest-data

Xgboost: using single test observation?


I want to fit a time series model using xgboost for R and I want to use only the last observation for testing the model (in a rolling window forecast, there will be more in total). But when I include only a single value in the test data I get the error: Error in xgb.DMatrix(data = X[n, ], label = y[n]) : xgb.DMatrix does not support construction from double. Is it possible to do this, or do I need a minimum of 2 test points?

Reproducible example:

library(xgboost)
n = 1000
X = cbind(runif(n,0,20), runif(n,0,20))
y = X %*% c(2,3) + rnorm(n,0,0.1)

train = xgb.DMatrix(data  = X[-n,],
                    label = y[-n])

test = xgb.DMatrix(data   = X[n,],
                    label = y[n]) # error here, y[.] has 1 value

test2 = xgb.DMatrix(data   = X[(n-1):n,],
                    label = y[(n-1):n]) # works here, y[.] has 2 values

There's another post here that addresses a similar issue, however it refers to the predict() function, whereas I refer to the test data that will later go into the watchlist argument of xgboost and used e.g. for early stopping.


Solution

  • The problem here is with the subset operation of the matrix with a single index. See,

    class(X[n, ])
    # [1] "numeric"
    
    class(X[n,, drop = FALSE])
    #[1] "matrix" "array" 
    

    Use X[n,, drop = FALSE] to get the test sample.

    test = xgb.DMatrix(data   = X[n,, drop = FALSE], label = y[n])
    
    xgb.model <- xgboost(data = train, nrounds = 15)
    predict(xgb.model, test)
    # [1] 62.28553