Search code examples
pythonpandasjupytergraphviz

Creating Decision Tree using python


I am creating a decision tree using a dataset named as "wine": i am trying following code to execute:

dt = c.fit(X_train, y_train)

Creating the image of the decision tree:

where "Malik Shahid Ali" is the location/path of the image

def show_tree(tree, features, path):
    f = io.StringIO()
    export_graphviz(tree, out_file=f, feature_names=features)
    pydotplus.graph_from_dot_data(f.getvalue()).write_png("Malik Shahid Ali")
    img = misc.imread("Malik Shahid Ali")
    plt.imshow(img)

Calling the image:

show_tree(dt, features, 'dec_tree_01.png')

but when i call the image it gives the following error:

GraphViz's executables not found

import section:

import numpy as np
import pandas as pd
from sklearn import tree
from sklearn.tree import DecisionTreeClassifier, export_graphviz
from sklearn.model_selection import train_test_split
import graphviz
import pydotplus
import io
from scipy import misc
import matplotlib.pyplot as plt #sets up plotting under plt
import seaborn as sb
from pylab import rcParams

reading csv dataset

data=pd.read_csv('C:/Users/malik/Desktop/wine.csv',low_memory=False)
data.head()

train, test = train_test_split(data,test_size=0.15)

print("Training size: {} Test size: {}".format(len(train),len(test)))

c=DecisionTreeClassifier(min_samples_split=2)

features = ["id","Alcohol","Malic acid","Ash","Alcalinity of ash","Magnesium","Total phenols","Flavanoids","Field9Nonflavanoid phenols","Proanthocyanins","Color intensity","Hue","OD280/OD315 of diluted wines","Proline"]

X_train = train[features]
y_train = train["id"]

X_test = test[features]
y_test = test["id"]

y_test

dt = c.fit(X_train, y_train)

path of the excutable file:

import os     
os.environ["PATH"] += os.pathsep + 'E:\Graphviz2.38\bin'

image function:

def show_tree(tree, features, path):
    f = io.StringIO()
    export_graphviz(tree, out_file=f, feature_names=features)
    pydotplus.graph_from_dot_data(f.getvalue()).write_png(path)
    img = misc.imread(path)

    plt.imshow(img)

show_tree(dt, features, 'dec_tree_01.png')

Now on this command jupyter is giving eror like this:

E:\python\lib\site-packages\pydotplus\graphviz.py in create(self, prog, format)
   1958             if self.progs is None:
   1959                 raise InvocationException(
-> 1960                     'GraphViz\'s executables not found')
   1961 
   1962         if prog not in self.progs:

InvocationException: GraphViz's executables not found

Solution

  • I'm re-purposing my answer to a related problem here.

    Make sure you have installed the actual executables, not just the python package. I used conda's install package here (recommended over pip install graphviz as pip install doesn't include the actual GraphViz executables).

    Update

    At the end of the day, an incorrectly formatted string path to the necessary file directory was added to the environment variable PATH. Be sure to add double back slashes in the string path to the directory, e.g.:

    import os
    os.environ["PATH"] += os.pathsep + 'E:\\Graphviz2.38\\bin\\'