algorithm grid path-finding disjoint-sets disjoint-union

Connecting disjoint sets in a 2D array

I am trying to generate a random grid with positions that are Traversable and Non-Traversable and ensure that there is a path from one Traversable position to any other Traversable position in one of the 4 directions {Right, Up, Left, Down}. Traversable positions are represented as "[ ]" and Non-Traversable positions are represented as "[X]"

Here is a grid I have generated: 
[ ][ ][ ][ ][ ][ ][ ][ ][X][ ][ ][X][ ][X]
[ ][ ][X][ ][ ][ ][X][ ][ ][X][X][ ][ ][ ]
[X][ ][ ][ ][ ][X][X][X][ ][ ][ ][X][ ][ ]
[ ][ ][ ][ ][ ][X][ ][ ][ ][X][ ][ ][X][ ]
[ ][X][ ][ ][ ][X][ ][ ][ ][ ][X][X][X][X]
[ ][ ][X][X][X][ ][ ][ ][X][X][X][X][X][X]
[ ][X][ ][ ][ ][X][ ][ ][ ][X][X][ ][ ][X]
[ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][X][ ][ ][ ]
[ ][ ][X][ ][ ][ ][X][ ][X][X][ ][ ][ ][ ]
[ ][X][ ][X][ ][ ][ ][ ][ ][ ][X][X][ ][ ]
[ ][ ][ ][ ][ ][ ][ ][ ][X][ ][ ][X][X][ ]

What algorithm can I use to find the disjoint sets in my grid and create a path between the disjoint sets? Thanks!

Solution

To find the disjoint components, you can use a breadth-first search (with a queue) or depth-first search (with a stack) starting at any traversable position. When the search terminates, it will have marked an entire component. Then if there are unmarked positions, use those as starting points for another search, etc., until you have marked all traversable positions.

To figure out which non-traversable positions have to be removed, if you wish to remove (nearly) as few as possible, think of each of the "disjoint sets" (better to call them "connected components") as single nodes in a graph, and look at a wide variety of paths connecting them. Count the number of X's that have to be removed for each path connecting a node to another node, and use that as the weight of an edge in a graph. Then you want to find the minimal spanning tree of that graph, using, e.g., Kruskal's algorithm.

That method is not guaranteed to find the minimum number of X's to remove to connect the traversable positions; for example, in the graph you gave, removing a single X near the top right corner connects three components, whereas my suggestion might result in removing two X's. However, your problem is vaguely specified, so I believe the difference is not important.

To find the exact minimum number of X's to remove, you would have to solve the "node-weighted Steiner tree problem" which in general is NP-hard, I believe. You may be able to get a good approximation, given that your graphs are planar: http://www-math.mit.edu/~hajiagha/NodePlanarSteiner.pdf.