Search code examples
pythonnetworkxgraphvizdotpydot

How do I extract edges that contain a certain label in a dot file?


My objective is to extract paths that contain a certain label within a dot file. However, this is the first time I have worked with a dot file. I have no idea how to extract the labels of a dot file using Python. For instance, in the dot file below, I want to extract the path that belong to the label "V1". Here is my dot file -

digraph "MVICFG" {
    label="MVICFG";
/* Generating Nodes */
    subgraph cluster_1 {
        label="main";
        "6" [label="4294967294::Entry::main"];
        "2" [label="0::  %1 = alloca i32, align 4"];
        "3" [label="0::  store i32 0, i32* %1, align 4"];
        "4" [label="0::  %2 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([12 x i8], [12 x i8]* @.str, i32 0, i32 0)), !dbg !13"];
        "5" [label="4::  ret i32 0, !dbg !14"];
        "7" [label="4294967293::Exit::main"];
        "11" [label="3::  %1 = alloca i32, align 4"];
        "12" [label="3::  store i32 0, i32* %1, align 4"];
        "13" [label="3::  %2 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([13 x i8], [13 x i8]* @.str, i32 0, i32 0)), !dbg !13"];
    }
    subgraph cluster_9 {
        label="External_Node_Func";
        "10" [label="4294967294::External_Node"];
    }

/* Generating Edges */
        "2" -> "3" [arrowhead = normal, penwidth = 1.0, color = black, label="V1"];
        "3" -> "4" [arrowhead = normal, penwidth = 1.0, color = black, label="V1"];
        "6" -> "2" [arrowhead = normal, penwidth = 1.0, color = pink, label="V1::Virtual"];
        "5" -> "7" [arrowhead = normal, penwidth = 1.0, color = pink, label="V1,V2::Virtual"];
        "4" -> "5" [arrowhead = normal, penwidth = 1.0, color = black, label="V1"];
        "6" -> "11" [arrowhead = normal, penwidth = 1.0, color = pink, label="V2::Virtual"];
        "13" -> "5" [arrowhead = normal, penwidth = 1.0, color = black, label="V2"];
        "11" -> "12" [arrowhead = normal, penwidth = 1.0, color = black, label="V2"];
        "12" -> "13" [arrowhead = normal, penwidth = 1.0, color = black, label="V2"];
}

Here is what I've done - I looked into a popular Python library that worked with dot files, called pydot. I wrote the following code, but couldn't get to the stage of extracting labels.

import pydot

dot_string = """digraph "MVICFG" {
    label="MVICFG";
/* Generating Nodes */
    subgraph cluster_1 {
        label="main";
        "6" [label="4294967294::Entry::main"];
        "2" [label="0::  %1 = alloca i32, align 4"];
        "3" [label="0::  store i32 0, i32* %1, align 4"];
        "4" [label="0::  %2 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([12 x i8], [12 x i8]* @.str, i32 0, i32 0)), !dbg !13"];
        "5" [label="4::  ret i32 0, !dbg !14"];
        "7" [label="4294967293::Exit::main"];
        "11" [label="3::  %1 = alloca i32, align 4"];
        "12" [label="3::  store i32 0, i32* %1, align 4"];
        "13" [label="3::  %2 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([13 x i8], [13 x i8]* @.str, i32 0, i32 0)), !dbg !13"];
    }
    subgraph cluster_9 {
        label="External_Node_Func";
        "10" [label="4294967294::External_Node"];
    }

/* Generating Edges */
        "2" -> "3" [arrowhead = normal, penwidth = 1.0, color = black, label="V1"];
        "3" -> "4" [arrowhead = normal, penwidth = 1.0, color = black, label="V1"];
        "6" -> "2" [arrowhead = normal, penwidth = 1.0, color = pink, label="V1::Virtual"];
        "5" -> "7" [arrowhead = normal, penwidth = 1.0, color = pink, label="V1,V2::Virtual"];
        "4" -> "5" [arrowhead = normal, penwidth = 1.0, color = black, label="V1"];
        "6" -> "11" [arrowhead = normal, penwidth = 1.0, color = pink, label="V2::Virtual"];
        "13" -> "5" [arrowhead = normal, penwidth = 1.0, color = black, label="V2"];
        "11" -> "12" [arrowhead = normal, penwidth = 1.0, color = black, label="V2"];
        "12" -> "13" [arrowhead = normal, penwidth = 1.0, color = black, label="V2"];
}

"""

graphs = pydot.graph_from_dot_data(dot_string)
graph = graphs[0]

Update 1:

If I am looking for the edges corresponding to the label "V1", I'd like this type of output -

"2" -> "3" 
"3" -> "4" 
"4" -> "5" 

I can get that from the code that SultanOrazbayev posted by adding the following line -

G_sub.edges

Solution

  • Assuming that the dot file is named test.dot, the following procedure will use networkx to load the dot file (this requires pydot to be installed also), and then filter the edges, returning a subgraph with the desired edges.

    from networkx import subgraph_view
    from networkx.drawing.nx_pydot import read_dot
    
    # load the dot file
    G = read_dot('test.dot')
    
    # define the function to filter edges
    def filter_edge(source, target, edge_id):
        """Note this function hardcodes the desired edge label,
        also note the nested quoting of the label to match the raw data."""
        if G[source][target][edge_id].get('label')=='"V1"':
            return True
    
    G_sub = subgraph_view(G, filter_edge=filter_edge)
    print(G_sub)
    # MultiDiGraph named 'MVICFG' with 10 nodes and 3 edges
    

    If you also want to remove the isolates, then use the relevant networkx function:

    from networkx import isolates, MultiDiGraph
    
    # make a modifiable copy of the graph
    G_sub = MultiDiGraph(G_sub)
    
    # identify which nodes to remove
    remove_nodes = list(isolates(G_sub))
    
    G_sub.remove_nodes_from(remove_nodes)
    print(G_sub)
    # MultiDiGraph named 'MVICFG' with 4 nodes and 3 edges
    

    Note that the result of isolates is stored in a list, this is to avoid iterating over a graph that is being modified, see this PR and associated GH issue.