Search code examples
pythonjsondecision-treedynamically-generated

How to generate multi-level decision tree from a JSON structure using Python?


Currently I am solving a problem on where I have to create a flexible multi-dimensional decision tree from a JSON structure in Python. There are composed decision rules in the JSON and each decision rule should correspond to the node of the decision tree. Here’s an example JSON structure:Here’s an example JSON structure:

{
  "rule": "A > 5",
  "true": {
    "rule": "B < 3",
    "true": {
      "rule": "C == 1",
      "true": "Leaf 1",
      "false": "Leaf 2"
    },
    "false": {
      "rule": "D != 4",
      "true": "Leaf 3",
      "false": "Leaf 4"
    }
  },
  "false": {
    "rule": "E >= 2",
    "true": {
      "rule": "F < 6",
      "true": "Leaf 5",
      "false": "Leaf 6"
    },
    "false": {
      "rule": "G == 0",
      "true": "Leaf 7",
      "false": "Leaf 8"
    }
  }
}

I want to create a decision tree using Python where each node is an instance of the TreeNode class:I want to create a decision tree using Python where each node is an instance of the TreeNode class:

class TreeNode:
    def __init__(self, rule, truebranch, falsebranch):
        self.rule = rule
        self.truebranch = truebranch
        self.falsebranch = falsebranch

    def evaluate(self, conditions):
        # Implementation needed
        pass

What I Need Guidance On:What I Need Guidance On: Parsing Rules: How should one identify the operators and values within the rule strings given in the JSON to correctly, or more specifically, efficiently, evaluate the conditions?

Evaluate Method: In what way should the evaluate method be used in order to correctly handle with conditions and go through the tree structure?

Handling Edge Cases: When does the problem of errors occur, and how can one avoid it, in relation to invalid rules and other instances where the data type of the corresponding leaf nodes is improper?

What I’ve Tried: Proposed a way of attempted parsing rule strings and handling of different data types of leaf nodes. Completed only a part of the lab having implemented a partial version of the evaluate method. Now concerning rules I have specific questions on how to derive practical strategies for reading them, using the evaluate method for implementing my decision tree and specific ways of dealing with nuisance cases.


Solution

  • As long as you have an "actual" condition(or any code for that matter) in your string you dont need to use any formatting. the function eval(expression, globals, locals) can take any string representing of a python code and execute it. for example:

    eval("print(3)") # output: 3
    print(eval("5 >= 12")) # output: False
    print(eval("A > 5", {"A": 6})) # output: True
    

    the only thing you need to notice is the stopping point for the leafs. there are a number of ways to do that, but this is my prefered way:

    class TreeNode: 
        rule: str
        truebranch: 'TreeNode'
        falsebranch: 'TreeNode' 
        def __init__(self, data_json): 
            self.rule = data_json['rule']
            self.truebranch = None
            self.falsebranch = None
            if type(data_json["true"]) is dict:
                self.truebranch = TreeNode(data_json["true"])
            if type(data_json["false"]) is dict:
                self.falsebranch = TreeNode(data_json["false"])
        
        def evaluate(self, conditions):
            cond_result = eval(self.rule, conditions)
            if cond_result:
                if self.truebranch is not None:
                    return self.truebranch.evaluate(conditions)
            else:
                if self.falsebranch is not None:
                    return self.falsebranch.evaluate(conditions)
            return cond_result
    

    this solution assumes the dict is in the correct format and that "Leaf x" has no special meaning