Search code examples
c++c++20graphvizboost-graph

How do I extract a subgraph from a directed graph using Boost.Graph?


I have a very large graph dot file and want to extract a subgraph from a given vertex. This subgraph should contain the given vertex and all vertexes below it.

#include <boost/graph/adjacency_list.hpp>
#include <boost/graph/depth_first_search.hpp>
#include <boost/graph/graphviz.hpp>

#include <iostream>

struct Vertex {
    std::string node;
};

using Graph = boost::adjacency_list<boost::vecS, boost::vecS, boost::directedS, Vertex>;

template <typename Fn>
requires std::is_invocable_r_v<bool, Fn, Graph::vertex_descriptor>
void removeVertexIf(Graph& graph, Fn const& fn) {
    Graph::vertex_descriptor count = num_vertices(graph);
    Graph::vertex_descriptor i = 0;
    while (i < count) {
        if (fn(i)) {
            clear_vertex(i, graph);
            remove_vertex(i, graph);
            --count;
        }
        else {
            ++i;
        }
    }
}

struct DFSVisitor : boost::default_dfs_visitor {
    DFSVisitor(std::vector<bool>& reachable)
        : reachable(reachable)
    {}

    void discover_vertex(Graph::vertex_descriptor const index, Graph const&) {
        reachable[index] = true;
    }

    std::vector<bool>& reachable;
};

void removeUnreachable(Graph& graph, Graph::vertex_descriptor const start_index) {
    std::vector<bool> reachable(num_vertices(graph), false);

    DFSVisitor visitor(reachable);

    depth_first_search(graph, boost::visitor(visitor).root_vertex(start_index));

    removeVertexIf(graph, [&](Graph::vertex_descriptor const index) {
        return reachable[index];
    });
}

int main() {
    std::istringstream input(
        "digraph{"
        "0;1;2;3;4;5;6;7;8;9;"
        "0->1;1->2;2->3;2->6;3->4;4->5;5->8;6->7;6->5;7->8;8->9;"
        "}");

    Graph g(0);

    boost::dynamic_properties dp(boost::ignore_other_properties);
    dp.property("node_id", get(&Vertex::node, g));

    boost::read_graphviz(input, g, dp);

    // delete everything that is not below Node 6
    removeUnreachable(g, 6);

    boost::write_graphviz(std::cout, g);
}

In this minimal example, the given vertex has the node ID 6. The following graphic shows what I want to extract:

What I have and what I want.

How can I remove the nodes and edges that are not below Node 6? My current removeUnreachable iterates over the entire graph instead of starting at start_index by depth_first_search.

Dot file to SVG graphic:

dot -Tsvg out.dot -o out.svg

Solution

  • You already figured out that depht_first_search does more than you thought it did. Instead of complicating the visitor, I'd suggest to use depht_first_visit instead: https://www.boost.org/doc/libs/1_83_0/libs/graph/doc/depth_first_visit.html

    Other Issues

    Adjacency lists with vertex container selector vecS have an implied contiguous integral vertex index, which doubles as the descriptor in that case. You must have been somewhat aware of this because your spelled it out:

    Graph::vertex_discriptor count = num_vertices(graph);
    

    When you remove an early vertex, you are effectively renumbering all vertices with higher index. This make it so that your removeVertexIf loop invalidates the values inside the reachable map.

    One way to avoid this would be to go by the name property (Vertex::node in your example). Another way is to renumber your unreachable entries in parallel with the removal, but this breaks the encapsulation of the predicate function: the predicate now must know about the remove algorithms internals.

    Another option, of course, would be to have a (temporary) extra mapping that indirects the original vertex index to the current index.

    Lastly you could select a vertex container that has reference and descriptor stability (e.g. setS and listS).

    Note that it may be much more performant to NOT REMOVE any vertices, instead just filtering them out on the fly. I'll present this as the BONUS take below

    Plenty of options, let's go with the simplest:

    ListS

    See it Live On Coliru

    #include <boost/graph/adjacency_list.hpp>
    #include <boost/graph/depth_first_search.hpp>
    #include <boost/graph/graphviz.hpp>
    #include <iostream>
    
    using Graph = boost::adjacency_list<boost::vecS, boost::listS, boost::directedS, //
                                        boost::property<boost::vertex_index_t, size_t>>;
    using V     = Graph::vertex_descriptor;
    using VSet  = std::set<V>;
    
    template <typename Pred>
        requires std::predicate<Pred, V>
    void removeVertexIf(Graph& g, Pred pred) {
        for (auto [it, e] = vertices(g); it != e;) {
            if (auto v = *it++; pred(v)) {
                clear_vertex(v, g);
                remove_vertex(v, g);
            }
        }
    }
    
    struct DFSVisitor : boost::default_dfs_visitor {
        DFSVisitor(VSet& reachable) : reachable(reachable) {}
        void discover_vertex(V v, Graph const& g) const {
            std::cout << "Marking " << get(boost::vertex_index, g, v) << " reachable\n";
            reachable.insert(v);
        }
    
        VSet& reachable;
    };
    
    void removeUnreachable(Graph& g, V start) {
        VSet reachable;
        std::vector<boost::default_color_type> colors(num_vertices(g));
        DFSVisitor visitor(reachable);
    
        auto idx = get(boost::vertex_index, g);
        depth_first_visit(g, start, visitor, make_iterator_property_map(colors.begin(), idx));
    
        removeVertexIf(g, [&](V v) { return !reachable.contains(v); });
    }
    
    int main() {
        Graph g;
        auto idx = get(boost::vertex_index, g);
    
        {
            std::istringstream        input("digraph{0->1;1->2;2->3;2->6;3->4;4->5;5->8;6->7;6->5;7->8;8->9;}");
            boost::dynamic_properties dp(boost::ignore_other_properties);
            dp.property("node_id", idx);
    
            read_graphviz(input, g, dp);
        }
    
        // delete everything that is not below Node 6
        auto v6 = vertex(6, g);
        assert(idx[v6] == 6);
        removeUnreachable(g, v6);
    
        write_graphviz(std::cout, g);
    }
    

    Of course you can make it work for vecS iff you remove in the correct order:

    template <typename Pred>
        requires std::predicate<Pred, V>
    void removeVertexIf(Graph& g, Pred pred) {
        for (auto v : boost::adaptors::reverse(vertices(g))) {
            if (pred(v)) {
                clear_vertex(v, g);
                remove_vertex(v, g);
            }
        }
    }
    

    Note though that you get the output you should expect: Live On Coliru

    digraph G {
    0;
    1;
    2;
    3;
    4;
    0->3 ;
    1->2 ;
    1->0 ;
    2->3 ;
    3->4 ;
    }
    

    To keep the original node ids, make it explicit:

    write_graphviz_dp(std::cout, g, dp);
    

    Now printing Live On Coliru:

    digraph G {
    5;
    6;
    7;
    8;
    9;
    5->8 ;
    6->7 ;
    6->5 ;
    7->8 ;
    8->9 ;
    }
    

    BONUS: Why Remove At All?

    The performance of clear_vertex and remove_vertex is going to make you cry¹. Instead, just filter for your target vertices:

    Live On Coliru

    #include <boost/graph/adjacency_list.hpp>
    #include <boost/graph/depth_first_search.hpp>
    #include <boost/graph/filtered_graph.hpp>
    #include <boost/graph/graphviz.hpp>
    #include <iostream>
    
    struct VProp { size_t original_idx; };
    using Graph = boost::adjacency_list<boost::vecS, boost::vecS, boost::directedS, VProp>;
    using V     = Graph::vertex_descriptor;
    using VSet  = std::set<V>;
    
    struct DFSVisitor : boost::default_dfs_visitor {
        DFSVisitor(VSet& reachable) : reachable(reachable) {}
        void  discover_vertex(V v, Graph const&) const { reachable.insert(v); }
        VSet& reachable;
    };
    
    VSet reachable(Graph& g, V start) {
        VSet reachable;
        std::vector<boost::default_color_type> colors(num_vertices(g));
        DFSVisitor visitor(reachable);
    
        depth_first_visit(g, start, visitor, colors.data());
    
        return reachable;
    }
    
    int main() {
        Graph g;
        auto idx = get(&VProp::original_idx, g);
    
        boost::dynamic_properties dp(boost::ignore_other_properties);
        dp.property("node_id", idx);
    
        {
            std::istringstream input("digraph{0->1;1->2;2->3;2->6;3->4;4->5;5->8;6->7;6->5;7->8;8->9;}");
            read_graphviz(input, g, dp);
        }
    
        // filter subgraph below Node 6
        auto const vv = reachable(g, vertex(6, g));
        write_graphviz_dp(
            std::cout,
            boost::filtered_graph(g, boost::keep_all{}, std::function([&vv](V v) { return vv.contains(v); })),
            dp);
    }
    

    Printing, again:

    digraph G {
    5;
    6;
    7;
    8;
    9;
    5->8 ;
    6->7 ;
    6->5 ;
    7->8 ;
    8->9 ;
    }
    

    ¹ see e.g. Remove 100,000+ nodes from a Boost graph