Search code examples
pythontreegraphviz

Visualize Yes/ No tree using Graphviz


I have a symptom diagnosis questionnaire data in the following form (python) : List of dictionaries of paths . here is an example of symptom diagnosis with first initial symptom (A) and 2 questions after it.

 qa=  [OrderedDict([('A', 1), ('B', 1), ('F', 1), ('C', 1), ('D', 1), ('E', 1)]),
 OrderedDict([('A', 1), ('B', 1), ('F', 1), ('C', 1), ('D', 1), ('E', 0)]),
 OrderedDict([('A', 1), ('B', 1), ('F', 1), ('C', 1), ('D', 0), ('E', 1)]),
 OrderedDict([('A', 1), ('B', 1), ('F', 1), ('C', 1), ('D', 0), ('E', 0)]),
 OrderedDict([('A', 1), ('B', 1), ('F', 1), ('C', 0), ('D', 1), ('E', 1)]),
 OrderedDict([('A', 1), ('B', 1), ('F', 1), ('C', 0), ('D', 1), ('E', 0)]),
 OrderedDict([('A', 1), ('B', 1), ('F', 1), ('C', 0), ('D', 0), ('E', 1)]),
 OrderedDict([('A', 1), ('B', 1), ('F', 1), ('C', 0), ('D', 0), ('E', 0)]),
 OrderedDict([('A', 1), ('B', 1), ('F', 0), ('E', 1), ('D', 1), ('C', 1)]),
 OrderedDict([('A', 1), ('B', 1), ('F', 0), ('E', 1), ('D', 1), ('C', 0)]),
 OrderedDict([('A', 1), ('B', 1), ('F', 0), ('E', 1), ('D', 0), ('C', 1)]),
 OrderedDict([('A', 1), ('B', 1), ('F', 0), ('E', 1), ('D', 0), ('C', 0)]),
 OrderedDict([('A', 1), ('B', 1), ('F', 0), ('E', 0), ('D', 1), ('C', 1)]),
 OrderedDict([('A', 1), ('B', 1), ('F', 0), ('E', 0), ('D', 1), ('C', 0)]),
 OrderedDict([('A', 1), ('B', 1), ('F', 0), ('E', 0), ('D', 0), ('C', 1)]),
 OrderedDict([('A', 1), ('B', 1), ('F', 0), ('E', 0), ('D', 0), ('C', 0)]),
 OrderedDict([('A', 1), ('B', 0), ('F', 1), ('C', 1), ('D', 1), ('E', 1)]),
 OrderedDict([('A', 1), ('B', 0), ('F', 1), ('C', 1), ('D', 1), ('E', 0)]),
 OrderedDict([('A', 1), ('B', 0), ('F', 1), ('C', 1), ('D', 0), ('E', 1)]),
 OrderedDict([('A', 1), ('B', 0), ('F', 1), ('C', 1), ('D', 0), ('E', 0)]),
 OrderedDict([('A', 1), ('B', 0), ('F', 1), ('C', 0), ('D', 1), ('E', 1)]),
 OrderedDict([('A', 1), ('B', 0), ('F', 1), ('C', 0), ('D', 1), ('E', 0)]),
 OrderedDict([('A', 1), ('B', 0), ('F', 1), ('C', 0), ('D', 0), ('E', 1)]),
 OrderedDict([('A', 1), ('B', 0), ('F', 1), ('C', 0), ('D', 0), ('E', 0)]),
 OrderedDict([('A', 1), ('B', 0), ('F', 0), ('C', 1), ('D', 1), ('E', 1)]),
 OrderedDict([('A', 1), ('B', 0), ('F', 0), ('C', 1), ('D', 1), ('E', 0)]),
 OrderedDict([('A', 1), ('B', 0), ('F', 0), ('C', 1), ('D', 0), ('E', 1)]),
 OrderedDict([('A', 1), ('B', 0), ('F', 0), ('C', 1), ('D', 0), ('E', 0)]),
 OrderedDict([('A', 1), ('B', 0), ('F', 0), ('C', 0), ('E', 1), ('D', 1)]),
 OrderedDict([('A', 1), ('B', 0), ('F', 0), ('C', 0), ('E', 1), ('D', 0)]),
 OrderedDict([('A', 1), ('B', 0), ('F', 0), ('C', 0), ('E', 0), ('D', 1)]),
 OrderedDict([('A', 1), ('B', 0), ('F', 0), ('C', 0), ('E', 0), ('D', 0)])]
    

While 1= YES , 0 = NO

I would like to plot the diagnosis in a decision tree format where each node split into 'YES" / 'NO' edges that lead to the next node and so.

i grouped the "yes" and "no" when both are available for same question because it run over the node using graphviz:

u = Digraph(name, strict=True ,filename='blabla',format='png',node_attr={'color': 'mediumpurple1', 'style': 'filled'})
u.attr(size='16,16')
answer_map = ['No','Yes']
nodes = []
edges = []
for path in qa:
    questions = [f'{j}_{lev}' for lev,j in enumerate(path.keys(), 1)]
    questions = [w.replace(':', '_') for w in questions]
    answers = [answer_map[item] for item in path.values()] 
    for i in range(len(questions)-1):
        #u.edge(questions[i], questions[i+1],label=answers[i])
        nodes.append((questions[i],questions[i+1]))
        edges.append(answers[i])
d = {'nodes':nodes,'edges':edges}
df_graph = pd.DataFrame(d).drop_duplicates()
df_graph_joined = df_graph.groupby('nodes')['edges'].apply(','.join).reset_index()

for row in df_graph_joined.itertuples():
    u.edge(row.nodes[0],row.nodes[1],label=row.edges)
u.render()

enter image description here

But ,as you can see , it is impossible to distinguish the diagnosis path. i would like to split the tree in each "yes"/"no" junction so by looking at the tree i can see each diagnostic path. How do i do that?

And i would like it to look like this: enter image description here


Solution

  • To split in each answer, you need to edit the nodes to distinct names. I suggest changing a node name by the full path. For example, for this OrderedDict:

    ('A', 1), ('B', 1), ('F', 1), ('C', 1), ('D', 1), ('E', 1)
    

    You can use something like this:

    root, root-A1, root-A1-B1, root-A1-B1-F1, root-A1-B1-F1-C1, root-A1-B1-F1-C1-D1
    

    In this example:

    • root represents A
    • root-A1 represents B, which path is A--> 1
    • root-A1-B1 represents F, which path is A --> 1 --> B --> 1

    Here is an example:

    # New node names
    qa_tree = []
    for path in qa:
        prefix = 'root'
        path_tree = OrderedDict()
        for i, (key, value) in enumerate(path.items()):
            key_tree = '{}'.format(prefix)
            prefix += '-{}{}'.format(key, value)
            path_tree[key_tree] = {'value': value, 'name': key}
        qa_tree.append(path_tree)
    
    name = 'test'
    u = Digraph(name, strict=True ,filename='blabla',format='png',node_attr={'color': 'mediumpurple1', 'style': 'filled'})
    u.attr(size='16,16')
    answer_map = ['No','Yes']
    nodes = []
    edges = []
    
    for path in qa_tree:
        questions = [key for i, key in enumerate(path.keys(), 1)]
        answers = [answer_map[item.get('value')] for item in path.values()] 
        names = [item.get('name') for item in path.values()] 
        for i in range(len(questions)-1):
            u.node(questions[i], label = names[i])
            u.node(questions[i+1], label = names[i+1] )
            u.edge(questions[i], questions[i+1], label=answers[i])
    u.render()
    

    enter image description here