Search code examples
algorithmgraph-theorydirected-graph

Propagating traits in a directed graph


I have a graph (<1 million vertices) with directed edges. The graph is probably disconnected and may contain loops. There may be edges going both ways between two vertices.
Each vertex can have 0-3 colors: red, green, blue. You may think of these as separate traits of a vertex, or you may think of them as the color of the vertex. (Black for no colors, yellow for R+G, cyan for G+B, magenta for R+B, and white for all colors.)

I need to transform the vertices of this graph in the following way:
For all pairs of vertices (u, v), if there exists a path from u to v, vertex v should get all the colors of vertex u in addition to its own colors.

Is there an algorithm for doing this faster than O(n^2) with respect to the amount of vertices?

If I did not make a mistake, then an example O(n^2) algorithm would be:
For each vertex u, add all colors of u to all vertices reachable from u. Finding all such vertices can be done using a depth-first search, keeping track of visited and queued vertices.

While I feel there should be a speed improvement somewhere, considering we can keep track of colors we've already visited, and applying all of them to all the vertices on the remainder of the path we're walking, I can't figure out the condition for stopping, nor the choice of vertices to start walking from in this case.


Solution

  • Your algorithm would actually be a little slower than O(n^2), as a depth-first search through the graph is linear in the number of edges, which is worst case squared in the number of vertices. Doing this for every vertex gives a total complexity of O(n^3).

    Doing it faster is indeed possible. Note that each Strongly Connected Component will exchange each of its colors between all its vertices. Thus, each of these components can be considered a single vertex.

    To find each Strongly Connected Component, you can use Tarjan's algorithm. You can then form a new graph in which each of these components is replaced by a single node (with all the colors in that component). This new graph is called the condensation of the original graph. This new condensed graph will not have any cycles because we've eliminated the connected components, and so we can sort it topologically. In a topologically sorted graph we only need to do a single forward pass to propagate all the colors. Finally, we reconstruct the uncondensed graph by substituting the components back in.

    That's a lot of steps, but they're all fairly straightforward and each of them has worst case O(n^2) complexity, so the total procedure is also O(n^2).

    Note that O(n^2) is still pretty big for a million vertices. However, it's actually linear in the amount of edges (or the amount of vertices if that is somehow higher). You can't possibly do better than that, as that is the size of the input and you'll need to read the whole input graph regardless.