Search code examples
algorithmoptimizationgraphgraph-algorithm

Given undirected graph when removing edges one-by-one verify if removed one was a bridge and if so - the vertices of both parts


I am looking for a time-effective algorithm for this particular problem:

I have undirected graph with up to 10,000 vertexes and about 1-10 edges out of given vertex.

Now I will remove chosen edge from graph and I want to know if the edge I just removed was a bridge - and if so, what are the vertices connected on both sides. I will repeat the step of removing edge frequently, possibly till I get 10 000 disconnected vertices (each time I need the information of bridge). So, for example last edge should inform that it was a bridge with a single vertex on one side and a single vertex on the other.

Preprocessing of data is fully acceptable, the memory costs - within reasonable limits of modern PC - are okay. I am looking for an algorithm that optimize time of edge removal operation.

My tool of trade is C#, but any pseudo-code or idea I would gladly accept


Solution

  • A close relative of this problem is studied in the literature under the name "decremental connectivity". The main difference is that you want to enumerate the new connected components instead of being able to answer connectivity queries.

    This paper seems to be the theoretical state of the art, but having skimmed it, I don't think that the new ideas will be useful to you, since you have a graph that's

    • Sparse
    • Likely too small for the asymptotic advantages to kick in
    • Right at that size where with a good representation it fits into L1 cache, but large enough that you can't also fit fancy data structures.

    My recommendation would be a simplified version of Even--Shiloach where the BFS part is omitted (it adds complexity and memory consumption but doesn't seem obviously winning in a sparse graph). The idea is to maintain

    • the residual graph
    • a spanning forest of the residual graph
    • a map from vertices to labels representing the connected components.

    If an edge not in the forest gets deleted, then we just delete it. Otherwise, we pick one side of the tree that just got cut (doesn't matter for correctness which side, but you want the smaller one for efficiency; I recommend rooting the spanning trees and using the child), traverse it to provisionally update its label to something new, and then scan all of its edges to determine whether the edge that was just deleted is a bridge to the rest of the tree. If it is, great; traverse the other new tree for reporting the vertex sets. Otherwise, we have to fix up the labels and the tree structure.

    For the data structures I recommend a compact adjacency list. Since there are at most 10,000 nodes, a node index will fit in a 16-bit short. There are up to 100,000 half-edges, unfortunately, so we'll use a 32-bit int to represent an edge index. We use an int array to point into a short array; entry 2*v is the beginning of the adjacency list for node v, and entry 2*v+1 is the end. For cache locality we store two extra pieces of data alongside the adjacency list: the connected component label, and the number of descendants in the spanning forest. Order the spanning forest descendants before the other neighbors.

    Overall here is what this looks like on a simple graph:

    Graph:
       0
      / \
     /   \
    1-----3     2
    
    Spanning forest:
       0
      /
     /
    1-----3     2
    
    Arrays:
    [|   |   |   |   |   |   |   |  ]
     |   |   |    \  |   |   |   |
     |   |   |     \  \  |   |    \
     |    \   \     \  | |   |     |
     v     v   v     v v v   v     v
    [0 1 1 3 x 0 1 3 0 2 0 x 0 0 0 1]
               ^ ^ \ /
               | |  |
               | |  adjacency list; descendants (3) before others (0)
               | |
               | number of spanning forest descendants
               |
               connected component label
                 = root of tree in spanning forest
    

    I left xs to represent a previously deleted edge from 0 to 2.

    (I don't have a lot of experience with C#, so it's possible that the mandatory bounds checking will be a problem, but I can't imagine that pointers would be better, especially on a 64-bit machine.)