Search code examples
c++graphboostparallel-processing

How to split set of vertices with Boost Graph?


I'm writing some algorithm in C++ for parallel graph coloring using Boost Graph and adjacency_list. I'm working with very big graph (the smallest has 32K vertices).

What I'm trying to do is to take the whole set of vertices, split them in parts and assign each part to a different thread and work in parallel, but I'm struggling with some passages.

The basic idea what this:

int step = g.m_vertices.size()/4;
int min = 0;

for(int i = 0; i < 4; i++){
    // call the function
}

And the function I call inside is something like that

for (vp = vertices(g); vp.first != vp.second; ++vp.first) {
    cout << *vp.first << endl;
}

So I have two questions:

  1. g.m_vertices.size()/4; is the right solutions? If initially I have 10 vertices, then I remove some vertex in the middle (e.g. 4), only 6 vertices left (so this is the new size) but the index of the vertices go from 0 to 5 or from 0 to 9?

  2. How can pass only a subset of vertices to vp instead of passing vertices(g)?


Solution

  • g.m_vertices.size()/4; is the right solutions?

    That depends ONLY on your requirements.

    If initially I have 10 vertices, then I remove some vertex in the middle (e.g. 4), only 6 vertices left (so this is the new size) but the index of the vertices go from 0 to 5 or from 0 to 9?

    That depends on your graph model. You don't specify the type of your graph (I know, you do say which template, but not the template parameters). Assuming vecS for the Vertex container selector, then yes, after 4 removals, the vertex descriptors (and index) will be [0,6).

    How can pass only a subset of vertices to vp instead of passing vertices(g)

    Many ways.

    1. you can std::for_each with a parallel execution policy
    2. you can use openmp to create a parallel section from the plain loop
    3. you can use filtered_graph adapter to create 4 "views" of the underlying graph and operate on those
    4. you can use PBGL which is actually created for dealing with huge graphs. This has the added benefit that it works with threading/interprocess/inter-host communication, can coordinate algorithms across segments etc.
    5. you can use sub_graphs; this is mainly (only) interesting if the way your graphs get built have a natural segmentation

    None of the solutions are trivial. But, here's naive demo using filtered_graph:

    Live On Compiler Explorer

    #include <boost/graph/adjacency_list.hpp>
    #include <boost/graph/filtered_graph.hpp>
    #include <boost/graph/random.hpp>
    #include <iostream>
    #include <random>
    
    using G = boost::adjacency_list<>;
    using V = G::vertex_descriptor;
    
    G make_graph() {
        G g;
        std::mt19937 prng(std::random_device{}());
        generate_random_graph(g, 32 * 1024 - (prng() % 37), 64 * 1024, prng);
        return g;
    }
    
    template <int NSegments, int Segment> struct SegmentVertices {
        std::hash<V> _h;
        bool operator()(V vd) const { return (_h(vd) % NSegments) == Segment; }
    };
    
    template <int N>
    using Quart = boost::filtered_graph<G, boost::keep_all, SegmentVertices<4, N>>;
    
    template <typename Graph>
    void the_function(Graph const& g, std::string_view name)
    {
        std::cout << name << " " << size(boost::make_iterator_range(vertices(g)))
                << " vertices\n";
    }
    
    int main()
    {
        G g = make_graph();
        the_function(g, "full graph");
    
        Quart<0> f0(g, {}, {});
        Quart<1> f1(g, {}, {});
        Quart<2> f2(g, {}, {});
        Quart<3> f3(g, {}, {});
    
        the_function(f0, "f0");
        the_function(f1, "f1");
        the_function(f2, "f2");
        the_function(f3, "f3");
    }
    

    Printing e.g.

    full graph 32766 vertices
    f0 8192 vertices
    f1 8192 vertices
    f2 8191 vertices
    f3 8191 vertices