I have a large dataset, and I'm trying to use R's package xgboost
to perform a regression on it.
The function xgboost
says that the argument data
can be a local data file, from which I understand it should be the name of the file to be used. There are however no further specifications about it, so my question is how exactly should be this file.
I've tried
random=matrix(rnorm(15),5,3)
colnames(random)=c("first","second","label")
write.csv(random,"random.csv")
bst <- xgboost(data = "random.csv",
nthread = 7,
nround = 3,
objective="reg:linear",
verbose=FALSE)
but that returns
6x0 matrix with 0 entries is loaded from random.csv
Error in xgb.iter.update(bst$handle, dtrain, i - 1, obj) :
NumCol:need column access
Many thanks!
The xgboost
local data file input does not support csv
. Quoting from this link
Currently XGBoost supports local data files in the libsvm format. - See more at: http://blog.nycdatascience.com/uncategorized/xgboost-introduction/#sthash.bmlHst0T.dpuf
See this Cross Validated Question/Answer for more information on what the libsvm format is.
Hope this helps.