I'm having a weird issue with a new installation of xgboost. Under normal circumstances it works fine. However, when I use the model in the following function it gives the error in the title.
The dataset I'm using is borrowed from kaggle, and can be seen here: https://www.kaggle.com/kemical/kickstarter-projects
The function I use to fit my model is the following:
def get_val_scores(model, X, y, return_test_score=False, return_importances=False, random_state=42, randomize=True, cv=5, test_size=0.2, val_size=0.2, use_kfold=False, return_folds=False, stratify=True):
print("Splitting data into training and test sets")
if randomize:
if stratify:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=test_size, stratify=y, shuffle=True, random_state=random_state)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=test_size, shuffle=True, random_state=random_state)
if stratify:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=test_size, stratify=y, shuffle=False)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=test_size, shuffle=False)
print(f"Shape of training data, X: {X_train.shape}, y: {y_train.shape}. Test, X: {X_test.shape}, y: {y_test.shape}")
if use_kfold:
val_scores = cross_val_score(model, X=X_train, y=y_train, cv=cv)
print("Further splitting training data into validation sets")
if randomize:
if stratify:
X_train_, X_val, y_train_, y_val = train_test_split(X_train, y_train, test_size=val_size, stratify=y_train, shuffle=True)
X_train_, X_val, y_train_, y_val = train_test_split(X_train, y_train, test_size=val_size, shuffle=True)
if stratify:
print("Warning! You opted to both stratify your training data and to not randomize it. These settings are incompatible with scikit-learn. Stratifying the data, but shuffle is being set to True")
X_train_, X_val, y_train_, y_val = train_test_split(X_train, y_train, test_size=val_size, stratify=y_train, shuffle=True)
X_train_, X_val, y_train_, y_val = train_test_split(X_train, y_train, test_size=val_size, shuffle=False)
print(f"Shape of training data, X: {X_train_.shape}, y: {y_train_.shape}. Val, X: {X_val.shape}, y: {y_val.shape}")
print("Getting ready to fit model.")
model.fit(X_train_, y_train_)
val_score = model.score(X_val, y_val)
if return_importances:
if hasattr(model, 'steps'):
feats = pd.DataFrame({
'Columns': X.columns,
'Importance': model[-2].feature_importances_
}).sort_values(by='Importance', ascending=False)
model.fit(X_train, y_train)
feats = pd.DataFrame({
'Columns': X.columns,
'Importance': model[-2].feature_importances_
}).sort_values(by='Importance', ascending=False)
feats = pd.DataFrame({
'Columns': X.columns,
'Importance': model.feature_importances_
}).sort_values(by='Importance', ascending=False)
model.fit(X_train, y_train)
feats = pd.DataFrame({
'Columns': X.columns,
'Importance': model.feature_importances_
}).sort_values(by='Importance', ascending=False)
mod_scores = {}
mod_scores['validation_score'] = val_scores.mean()
if return_folds:
mod_scores['fold_scores'] = val_scores
mod_scores['validation_score'] = val_score
if return_test_score:
mod_scores['test_score'] = model.score(X_test, y_test)
if return_importances:
return mod_scores, feats
return mod_scores
The weird part that I'm running into is that if I create a pipeline in sklearn, it works on the dataset outside of the function, but not within it. For example:
from sklearn.pipeline import make_pipeline
from category_encoders import OrdinalEncoder
from xgboost import XGBClassifier
pipe = make_pipeline(OrdinalEncoder(), XGBClassifier())
X = df.drop('state', axis=1)
y = df['state']
In this case, pipe.fit(X, y)
works just fine. But get_val_scores(pipe, X, y)
fails with the error message in the title. What's weirder is that get_val_scores(pipe, X, y)
seems to work with other datasets, like Titanic. The error occurs as the model is fitting on X_train
and y_train
In this case the loss function is binary:logistic
, and the state
column has the values successful
and failed
xgboost library is currently under updating to fix this bug, so the current solution is to downgrade the library to older versions, for me I have solved this problem by downgrading to xgboost v0.90
Try to check your xgboost version by cmd:
import xgboost
If the version was not 0.90 then uninstall the current version by:
pip uninstall xgboost
Install xgboost version 0.90
pip install xgboost==0.90
run your code again!