Search code examples
pythonscikit-learnxgboost

How to plot_tree for pipelined MultiOutput Classifier?


I want to interpret my model, to learn why this model gives me 1 or 0 for labels. , so i want to use plot_tree function from xgboost. My problem is a multi label classification problem; I wrote following code;

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, shuffle=True, random_state=42)

model = MultiOutputClassifier(
        xgb.XGBClassifier(objective="binary:logistic",
                         colsample_bytree = 0.5,
                          gamma = 0.1
                         ))

#Define a pipeline
pipeline = Pipeline([("preprocessing", col_transformers), ("XGB", model)])

pipeline.fit(X_train, y_train)

predicted = pipeline.predict(X_test)

xgb.plot_tree(pipeline, num_trees=4)   

This code gives me the error;

'Pipeline' object has no attribute 'get_dump'

If i change the code ;

xgb.plot_tree(pipeline.named_steps["XGB"], num_trees=4)

'MultiOutputClassifier' object has no attribute 'get_dump'

How can i solve this problem?


Solution

  • You can only use the plot_tree function on Booster or XGBModel instances. Your first case fails as you are passing a Pipeline object, and in the second case you are passing the MultiOutputClassifier object.

    Instead, you have to pass the fitted XGBClassifier objects. However, be aware of how MultiOutputClassifier actually works:

    This strategy consists of fitting one classifier per target.

    This means you will have one fitted model for each label.

    You can access them with the estimators_ attribute of MultiOutputClassifier. For example, you can retrieve the model for the first label like this:

    xgb.plot_tree(pipeline.named_steps["XGB"].estimators_[0], num_trees=4)
    

    If want all, you would have to loop over the list returned by the estimators_ attribute.