I am trying to create model using XGBoost.
It seems like I manage to train the model, however, when I try to predict my test data and to see the actual prediction, I get the following error:
ValueError: Data must be 1-dimensional
This is how I tried to predict my data:
from dask_ml.model_selection import train_test_split
import dask
import xgboost
import dask_xgboost
from dask.distributed import Client
import dask_ml.model_selection as dcv
#split the data
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.33,random_state=42)
client = Client(n_workers=10, threads_per_worker=1)
#Trying to do hyperparamter running
model_xgb = xgb.XGBRegressor(seed=42,verbose=True)
params={
'learning_rate':[0.1,0.01,0.05],
'max_depth':[1,5,8],
'gamma':[0,0.5,1],
'scale_pos_weight':[1,3,5]
}
grid_search = GridSearchCV(model_xgb, params, cv=3, scoring='neg_mean_squared_error')
grid_search.fit(x_train, y_train)
#train data with best paraeters
bst = dask_xgboost.train(client, grid_search.best_params_, x_train, y_train, num_boost_round=10)
#predict data
dask_xgboost.predict(client, bst, x_test).persist()
The last line with the predict works, but when I addl compute to the endd in order to see the actual array I get the dimensional error:
dask_xgboost.predict(client, bst, x_test).persist().compute()
>>>ValueError: Data must be 1-dimensional
How can I get predictions with .predict
?
As noted in the pip
page for dask-xgboost
:
Dask-XGBoost has been deprecated and is no longer maintained.
The functionality of this project has been included directly
in XGBoost. To use Dask and XGBoost together, please use
xgboost.dask instead
https://xgboost.readthedocs.io/en/latest/tutorials/dask.html.
The code you provided has a few missing assignments and expressions (e.g. how x
is defined, where GridSearchCV
is imported from). A few things that probably should be changed:
# note the .dask
model_xgb = xgb.dask.DaskXGBRegressor(seed=42, verbose=True)
grid_search = GridSearchCV(model_xgb, params, cv=3, scoring='neg_mean_squared_error')
grid_search.fit(x_train, y_train)
#train data with best params
model_xgb.client = client
model_xgb.set_params(grid_search.best_params_)
model_xgb.fit(X_train, y_train, eval_set=[(X_test, y_test)])