Search code examples
pythondata-structurestreeabstract-syntax-tree

How to recover a tree structure from an unfavorable representation of one?


Let's say we have an unfavorable representation of a tree. This tree is not nested, but flattened and its nodes are "connected" only by ids:

{
   "nodes":[
      {
         "id":0,
         "value":"and",
         "children":[
            1,
            4
         ]
      }
      {
         "id":1,
         "value":"or",
         "children":[
            2,
            3
         ]
      },
      {
         "id":4,
         "value":"or",
         "children":[
            5,
            6
         ]
      }
   ],
   "leafs":[
      {
         "id":2,
         "value":"some statement"
      },
      {
         "id":3,
         "value":"some statement"
      },
      {
         "id":5,
         "value":"some statement"
      },
      {
         "id":6,
         "value":"some statement"
      }
   ]
}

You can see that the tree is not only flattened, there is also a rather unnecessary representation of the leafs as a dedicated list. The ids of the leafs are therefore appearing twice: Once as a child in its parent node and once as an identifier of the leaf.

What I want is a nested representation of this tree as dedicated python objects. I have to substitute the "id" with the whole object and get rid of the overly complicated list representation.

This is what I want:

{
  "tree": {
    "id": 0,
    "value": "and",
    "children": [
      {
        "id": 1,
        "value": "or",
        "children": [
          {
            "id": 2,
            "value": "some statement"
          },
          {
            "id": 3,
            "value": "some statement"
          }
        ]
      },
      {
        "id": 4,
        "value": "or",
        "children": [
          {
            "id": 6,
            "value": "some statement"
          },
          {
            "id": 6,
            "value": "some statement"
          }
        ]
      }
    ]
  }
}

How would I start to parse the two lists, so that I can build python objects (node and leaf classes), that reference each other and are representing this tree structure just by reference.

class Node:
    def __init__(self, id, operator):
        self.id = id
        self.value= operator
        self.children = None

class Leaf:
    def __init__(self, id, operator):
        self.id = id
        self.value = None

These are my tree classes, but I don't know how to traverse the two list in a way that brings me to my desired tree.


Solution

  • You can convert your tree data to the dictionary with id as the key and node_data as the value.

    Code:

    
    class Node:
        def __init__(self, node_id: int, operator: str, children: Optional[list] = None):
            self.id = node_id
            self.value = operator
            self.children = children
    
        def __repr__(self):
            return f'Node(id={self.id}, value={self.value})'
    
    
    def build_tree(start: int, nodes: dict):
        node_data = nodes[start]
        node = Node(node_data['id'], node_data['value'])
        if node_data.get('children'):
            node.children = [build_tree(child, nodes) for child in node_data.get('children')]
        return node
    
    
    nodes = {node['id']: node for node in data['leafs']}
    nodes.update({node['id']: node for node in data['nodes']})
    
    root = build_tree(0, nodes)
    

    In the code I have created a class Node that maintains all the values, for leaf nodes children parameter will be None.

    Update:

    If you are using Node class as pydantic model

    class Node(BaseModel):
        id: int
        value: str
        children: Optional[List['Node']]
    
        def to_dict(self) -> dict:
            return {'tree': self.dict()}
    

    In the above pydantic model, I have added to_dict() method that will return that dictionary representation of the tree

    Sample Output:

    {
        'tree': {
            'id': 0,
            'value': 'and',
            'children': [
                {
                    'id': 1,
                    'value': 'or',
                    'children': [
                        {'id': 2, 'value': 'some statement', 'children': None},
                        {'id': 3, 'value': 'some statement', 'children': None},
                    ],
                },
                {
                    'id': 4,
                    'value': 'or',
                    'children': [
                        {'id': 5, 'value': 'some statement', 'children': None},
                        {'id': 6, 'value': 'some statement', 'children': None},
                    ],
                },
            ],
        }
    }