Search code examples
pythonjsonthejit

How to Create a JSON Tree from a Tabulated Hierarchy in Python


I'm sure there's an elegant way of doing this in Python (or at a push, Javascript), but for the life of me I can't see it...

I have a CSV file of the form:

ID, Name, Description
A, A-name,
A100, A100-name, A100-desc
A110, A110-name, A110-desc
A111, A111-name, A111-desc
A112, A112-name, A112-desc
A113, A113-name, A113-desc
A120, A120-name, A120-desc
A131, A131-name, A131-desc
A200, A200-name, A200-desc
B, B-name,
B100, B100-name, B100-desc
B130, B130-name, B130-desc
B131, B131-name, B131-desc
B140, B140-name, B140-desc

and I want to generate a hierarchical JSON structure so I can visualise the data in theJIT.

var json = {  
  "id": "aUniqueIdentifier",  
  "name": "usually a nodes name",  
  "data": {  
    "some key": "some value",  
    "some other key": "some other value"  
   },  
  "children": [ *other nodes or empty* ]  
}; 

My plan was to map ID to id, Name to name, Description to data.desc, and organise the hierarchy so that:

  • Root is parent of A and B
  • A is parent of A100 and A200
  • A100 is parent of A110 and A120
  • A110 is parent of A111, A112 and A113
  • B is parent of B100
  • B100 is parent of B130 and B140
  • B130 is parent of B131

There is also a pathological case in the otherwise regular ordering by ID, where A100 is parent of A131 (the expected A130 is not present).

I was hoping to find an elegant Python solution to this, but it's defeating me at the moment, even ignoring the pathological case...


Solution

  • This does it...

    import csv
    import json
    
    class Node(dict):
        def __init__(self, (nid, name, ndescr)):
            dict.__init__(self)
            self['id'] = nid
            self['name'] = name.lstrip() # you have badly formed csv....
            self['description'] = ndescr.lstrip()
            self['children'] = []
    
        def add_node(self, node):
            for child in self['children']:
                if child.is_parent(node):
                    child.add_node(node)
                    break
            else:
                self['children'].append(node)
    
        def is_parent(self, node):
            if len(self['id']) == 4 and self['id'][-1] == '0':
                return node['id'].startswith(self['id'][:-1])
            return node['id'].startswith(self['id'])
    
    class RootNode(Node):
        def __init__(self):
            Node.__init__(self, ('Root', '', ''))
    
        def is_parent(self, node):
            return True
    
    def pretty_print(node, i=0):
        print '%sID=%s NAME=%s %s' % ('\t' * i, node['id'], node['name'], node['description'])
        for child in node['children']:
            pretty_print(child, i + 1)
    
    def main():
        with open('input.csv') as f:
            f.readline() # Skip first line
            root = RootNode()
            for node in map(Node, csv.reader(f)):
                root.add_node(node)
    
        pretty_print(root)
        print json.dumps(root)
    
    if __name__ == '__main__':
        main()