I want to interpret my model, to learn why this model gives me 1 or 0 for labels. , so i want to use plot_tree function from xgboost. My problem is a multi label classification problem; I wrote following code;
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, shuffle=True, random_state=42)
model = MultiOutputClassifier(
xgb.XGBClassifier(objective="binary:logistic",
colsample_bytree = 0.5,
gamma = 0.1
))
#Define a pipeline
pipeline = Pipeline([("preprocessing", col_transformers), ("XGB", model)])
pipeline.fit(X_train, y_train)
predicted = pipeline.predict(X_test)
xgb.plot_tree(pipeline, num_trees=4)
This code gives me the error;
'Pipeline' object has no attribute 'get_dump'
If i change the code ;
xgb.plot_tree(pipeline.named_steps["XGB"], num_trees=4)
'MultiOutputClassifier' object has no attribute 'get_dump'
How can i solve this problem?
You can only use the plot_tree
function on Booster
or XGBModel
instances. Your first case fails as you are passing a Pipeline
object, and in the second case you are passing the MultiOutputClassifier
object.
Instead, you have to pass the fitted XGBClassifier
objects. However, be aware of how MultiOutputClassifier
actually works:
This strategy consists of fitting one classifier per target.
This means you will have one fitted model for each label.
You can access them with the estimators_
attribute of MultiOutputClassifier
. For example, you can retrieve the model for the first label like this:
xgb.plot_tree(pipeline.named_steps["XGB"].estimators_[0], num_trees=4)
If want all, you would have to loop over the list returned by the estimators_
attribute.