I have a dictionary which represents a decision tree:
{'Outlook': {'Overcast': 'Yes', 'Rain': {'Wind': {'Strong': 'No', 'Weak': 'Yes'}}, 'Sunny': {'Temperature': {'Cool': 'Yes', 'Hot': 'No', 'Mild': 'No'}}}}
Visualized, it looks like below:
This tree was made with some training data and an ID3 algorithm; I wish to predict the decision for examples from my testing data:
Outlook Temperature Humidity Wind Decision
Sunny Mild Normal Strong Yes
Overcast Mild High Strong Yes
Overcast Hot Normal Weak Yes
Rain Mild High Strong No
Using the first example, a rough idea of the order things are checked:
Current dict 'outlook'
Examine 'outlook', found 'sunny':
'sunny' is a dict, make current dict the 'sunny' subdict
Examine 'temperature', found 'mild':
'mild' is not a dict, return value 'no'
I'm not sure how to traverse the dictionary like this, however. I've got some code to start with:
def fun(d, t):
"""
d -- decision tree dictionary
t -- testing examples in form of pandas dataframe
"""
for _, e in t.iterrows():
predict(d, e)
def predict(d, e):
"""
d -- decision tree dictionary
e -- a testing example in form of pandas series
"""
# ?
In predict()
, e
can be accessed as a dictionary:
print(e.to_dict())
# {'Outlook': 'Rain', 'Temperature': 'Cool', 'Humidity': 'Normal', 'Wind': 'Weak', 'Decision': 'Yes'}
print(e['Outlook'])
# 'Rain'
print(e['Decision'])
# 'Yes'
# etc
I'm just not sure how to traverse the dict. I need to iterate over the testing example in the order attributes appear in the decision tree, not in the order they appear in the testing example.
import pandas as pd
dt = {'Outlook': {'Overcast': 'Yes', 'Rain': {'Wind': {'Strong': 'No', 'Weak': 'Yes'}}, 'Sunny': {'Temperature': {'Cool': 'Yes', 'Hot': 'No', 'Mild': 'No'}}}}
df = pd.DataFrame(data=[['Sunny', 'Mild', 'Normal', 'Strong', 'Yes']],columns=['Outlook', 'Temperature', 'Humidity', 'Wind', 'Decision'])
def fun(d, t):
"""
d -- decision tree dictionary
t -- testing examples in form of pandas dataframe
"""
res = []
for _, e in t.iterrows():
res.append(predict(d, e))
return res
def predict(d, e):
"""
d -- decision tree dictionary
e -- a testing example in form of pandas series
"""
current_node = list(d.keys())[0]
current_branch = d[current_node][e[current_node]]
# if leaf node value is string then its a decision
if isinstance(current_branch, str):
return current_branch
# else use that node as new searching subtree
else:
return predict(current_branch, e)
print(fun(dt, df))
output:
['No']