Search code examples
algorithmimage-processingmultidimensional-arrayareaneighbours

Connected Component Labeling - Implementation


I have asked a similar question some days ago, but I have yet to find an efficient way of solving my problem. I'm developing a simple console game, and I have a 2D array like this:

1,0,0,0,1
1,1,0,1,1
0,1,0,0,1
1,1,1,1,0
0,0,0,1,0

I am trying to find all the areas that consist of neighboring 1's (4-way connectivity). So, in this example the 2 areas are as following:

1
1,1
  1
1,1,1,1
      1

and :

       1
     1,1
       1

The algorithm, that I've been working on, finds all the neighbors of the neighbors of a cell and works perfectly fine on this kind of matrices. However, when I use bigger arrays (like 90*90) the program is very slow and sometimes the huge arrays that are used cause stack overflows.

One guy on my other question told me about connected-component labelling as an efficient solution to my problem.

Can somebody show me any C++ code which uses this algorithm, because I'm kinda confused about how it actually works along with this disjoint-set data structure thing...

Thanks a lot for your help and time.


Solution

  • I'll first give you the code and then explain it a bit:

    // direction vectors
    const int dx[] = {+1, 0, -1, 0};
    const int dy[] = {0, +1, 0, -1};
    
    // matrix dimensions
    int row_count;
    int col_count;
    
    // the input matrix
    int m[MAX][MAX];
    
    // the labels, 0 means unlabeled
    int label[MAX][MAX];
    
    void dfs(int x, int y, int current_label) {
      if (x < 0 || x == row_count) return; // out of bounds
      if (y < 0 || y == col_count) return; // out of bounds
      if (label[x][y] || !m[x][y]) return; // already labeled or not marked with 1 in m
    
      // mark the current cell
      label[x][y] = current_label;
    
      // recursively mark the neighbors
      for (int direction = 0; direction < 4; ++direction)
        dfs(x + dx[direction], y + dy[direction], current_label);
    }
    
    void find_components() {
      int component = 0;
      for (int i = 0; i < row_count; ++i) 
        for (int j = 0; j < col_count; ++j) 
          if (!label[i][j] && m[i][j]) dfs(i, j, ++component);
    }
    

    This is a common way of solving this problem.

    The direction vectors are just a nice way to find the neighboring cells (in each of the four directions).

    The dfs function performs a depth-first-search of the grid. That simply means it will visit all the cells reachable from the starting cell. Each cell will be marked with current_label

    The find_components function goes through all the cells of the grid and starts a component labeling if it finds an unlabeled cell (marked with 1).

    This can also be done iteratively using a stack. If you replace the stack with a queue, you obtain the bfs or breadth-first-search.