Search code examples
rxgboostlarge-data

R xgboost - how to use local data files?


I have a large dataset, and I'm trying to use R's package xgboost to perform a regression on it.

The function xgboost says that the argument data can be a local data file, from which I understand it should be the name of the file to be used. There are however no further specifications about it, so my question is how exactly should be this file.

I've tried

random=matrix(rnorm(15),5,3)
colnames(random)=c("first","second","label")
write.csv(random,"random.csv")
bst <- xgboost(data = "random.csv", 
               nthread = 7, 
               nround = 3,
               objective="reg:linear",
               verbose=FALSE)

but that returns

6x0 matrix with 0 entries is loaded from random.csv
Error in xgb.iter.update(bst$handle, dtrain, i - 1, obj) : 
NumCol:need column access

Many thanks!


Solution

  • The xgboost local data file input does not support csv. Quoting from this link

    Currently XGBoost supports local data files in the libsvm format. - See more at: http://blog.nycdatascience.com/uncategorized/xgboost-introduction/#sthash.bmlHst0T.dpuf

    See this Cross Validated Question/Answer for more information on what the libsvm format is.

    Hope this helps.