Search code examples
python-2.7scikit-learngraphvizpipelinedecision-tree

export graphviz during decision tree giving error


I am trying to create decision tree and graph for it.

print "\nCreating Pipeline for the analyzing and training ..."
dt_old = Pipeline([
                    ('bow', CountVectorizer(analyzer=split_into_lemmas)),  # strings to token integer counts
                    ('tfidf', TfidfTransformer()),  # integer counts to weighted TF-IDF scores
                    ('classifier', DecisionTreeClassifier(min_samples_split=20, random_state=99)),  # train on TF-IDF vectors w/ DecisionTree classifier
                ])
print("pipeline:", [name for name, _ in dt_old.steps])
print("-- 10-fold cross-validation , without any grid search")
dt_old.fit(msg_train, label_train)
scores = cross_val_score(dt_old, msg_train, label_train, cv=10)
print "mean: {:.3f} (std: {:.3f})".format(scores.mean(), scores.std())

from sklearn.externals.six import StringIO
import pydot

dot_data = StringIO()
with open("./plots/ritesh.dot", "w") as f:
    export_graphviz(dt_old, out_file=f)

Whenever I try to create a dot file for decision Tree I get following error.

Creating Pipeline for the analyzing and training ...
('pipeline:', ['bow', 'tfidf', 'classifier'])
-- 10-fold cross-validation , without any grid search
mean: 0.960 (std: 0.007)
Traceback (most recent call last):
  File "DecisionTree.py", line 192, in <module>
    main()
  File "DecisionTree.py", line 128, in main
    export_graphviz(dt_old, out_file=f)
  File "/Users/ritesh/anaconda/lib/python2.7/site-packages/sklearn/tree/export.py", line 128, in export_graphviz
    recurse(decision_tree.tree_, 0, criterion=decision_tree.criterion)
AttributeError: 'Pipeline' object has no attribute 'tree_'

Without pipeline I can generate the dot file, but no success with Pipeline. Am I missing something ?

generated output file is just:

digraph Tree {

Solution

  • You should provide tree object into export_graphviz function, instead of Pipeline object. To do that - you have to get tree classifier from your pipeline and pass it into export_graphviz

    Try to run your code with these last lines instead:

    with open("./plots/ritesh.dot", "w") as f:
        export_graphviz(dt_old.named_steps['classifier'], out_file=f)