I have a decision tree which is created in R using the titanic example. This tree is validated and correct. (decision tree R)
Now I am creating the same tree in Python, using the exact same dataset and columns to create the tree. I do this using Graphviz, but since I cannot import it in Python itself (Spyder), I just export the data to Graphviz and then create the graph on their website http://webgraphviz.com/
The code I use for exporting is:
import sklearn.tree as tree
tree.export_graphviz(rpart, out_file="tree.dot", filled=True,
feature_names=list(titanic_dmy.drop(['survived'], axis=1).columns),
impurity=False, label=None, proportion=True,
class_names=['Survived', 'Died'])
The created tree looks like this
The numbers do not match 100%, but they are very close. The problem here is that the tree created by Python is the exact opposite of what R created.
For example: R shows that if you are a male, you have to go to box 2 which is 'age'. If you are female, you have to go to box 2 which is 'Third class'. However, this is showed the other way around in Python. So: male goes to third class and female goes to age. This affects the final result, since R shows that the female survives and Python shows that the male survives.
Does someone have any idea what went wrong here?
The full code, with support datasets can be found on OneDrive: https://1drv.ms/u/s!AjkQWQ6EO_fMiSVkhk9yIqsdlA-4
Regards, Ganesh
I think you are mistakenly reading this the wrong way around and the trees are actually quite similar.
If you are female, then (male <= 0.5) = True, so you go to the box on the left, 'Third Class'. If you are male, then (male <= 0.5) = False, so you go to the box on the right, 'Age'.