algorithm graph grid depth-first-search breadth-first-search

Find largest ones after setting a coordinate to one

Interview Question:

You are given a grid of ones and zeros. You can arbitrarily select any point in that grid. You have to write a function which does two things:

If you choose e.g. coordinate (3,4) and it is zero you need to flip that to a one. If it is a one you need to flip that to a zero.
You need to return the largest contiguous region with the most ones i.e. ones have to be at least connected to another one.

E.g.

[0,0,0,0]
[0,1,1,0]
[1,0,1,0]

We have the largest region being the 3 ones. We have another region which have only one one (found at coordinate (2,0)).

You are to find an algorithm that will solve this where you will call that function many times. You need to ensure that your amortized run time is the lowest you can achieve.

My Solution which has Time Complexity:O(num_row*num_col) each time this function is called:

def get_all_coordinates_of_ones(grid):
    set_ones = set()
    for i in range(len(grid[0])):
        for j in range(len(grid)):
            if grid[i][j]:
               set_ones.add((i, j))

    return set_ones

def get_largest_region(x, y, grid):

    num_col = len(grid)
    num_row = len(grid[0])
    one_or_zero = grid[x][y]

    if not grid[x][y]:
       grid[x][y] = 1 - grid[x][y]

    # get the coordinates of ones in the grid
    # Worst Case O(num_col * num_row)
    coordinates_ones = get_all_coordinates_of_ones(grid)

    while coordinates_ones:
       queue = collections.deque([coordinates_ones.pop()])
       largest_one = float('-inf')
       count_one = 1
       visited = set()
       while queue:
           x, y = queue.popleft()
           visited.add((x, y))
           for new_x, new_y in ((x, y + 1), (x, y - 1), (x + 1, y), (x - 1, y)):
               if (0 <= new_x < num_row and 0 <= new_y < num_col):
                   if grid[new_x][new_y] == 1 and (new_x, new_y) not in visited:
                       count_one += 1
                       if (new_x, new_y) in coordinates_ones:-
                           coordinates_ones.remove((new_x, new_y))
                       queue.append((new_x, new_y))
       largest_one = max(largest_one, count_one)
    return largest_one

My Proposed modifications:

Use Union Find by rank. Encountered a problem. Union all the ones that are adjacent to each other. Now when one of the coordinates is flipped e.g. from zero to one I will need to remove that coordinate from the region that it is connected to.

Questions are:

What is the fastest algorithm in terms of time complexity?
Using Union Find with rank entails removing a node. Is this the way to do improve the time complexity. If so, is there an implementation of removing a node in union find online?

------------------------ EDIT ---------------------------------

Should we always subtract one from the degree from sum(degree-1 of each 'cut' vertex). Here are two examples the first one where we need to subtract one and the second one where we do not need to subtract one:

Block Cut Tree example 1

Cut vertex is vertex B. Degree of vertex B in the block cut tree is 2.

Sum(cardinality of each 'block' vertex) : 2(A,B) + 1(B) + 3 (B,C,D) = 6

Sum(degree of each 'cut' vertex) : 1 (B)

Block cut size: 6 – 1 = 5 but should be 4 (A. B, C, D, E, F). Here need to subtract one more.

Block Cut Tree Example 2

Sum(cardinality of each 'block' vertex) : 3 (A,B,C) + 1(C) + 1(D) + 3 (D, E, F) = 8

Sum(degree of each 'cut' vertex) : 2 (C and D)

Block cut size: 8 – 2 = 6 which is (A. B, C, D, E, F). Here no need to subtract one.

Solution

Without preprocessing:

Flip the cell in the matrix.
Consider the matrix as a graph where each '1' represents a node, and neighbor nodes are connected with an edge.
Find all connected components. For each connected component - store its cardinality.
Return the highest cardinality.

Note that O(V) = O(E) = O(num_row*num_col).

Step 3 takes O(V+E)=O(num_row*num_col), which is similar to your solution.

You are to find an algorithm that will solve this where you will call that function many times. You need to ensure that your amortized run time is the lowest you can achieve.

That hints that you can benefit from preprocessing:

Preprocessing:

Consider the original matrix as a graph G where each '1' represents a node, and neighbor nodes are connected with an edge.
Find all connected components
Construct the set of block-cut trees (section 5.2) of G (also here, here and here) (one block-cut tree for each connected component of G). Construction: see here.

Processing:

If you flip a '0' cell to '1':

Find neighbor connected components (0 to 4)
Remove old block-cut trees, construct a new block-cut tree for the merged component (Optimizations are possible: in some cases, previous tree(s) may be updated instead of reconstructed).

If you flip a '1' cell to '0':

If this cell is a 'cut' in a block-cut tree:
- remove it from the block-cut-tree
- remove it from each neighbor 'cut' vertex
- split the block-cut-tree into several block-cut trees
Otherwise (this cell is part of only one 'block vertex')
- remove it from the 'block' vertex; if empty - remove vertex. If block-cut-tree empty - remove it from the set of trees.

The size of a block-cut tree = sum(cardinality of each 'block' vertex) - sum(neighbor_blocks-1 of each 'cut' vertex).

Block-cut trees are not 'well known' as other data structures, so I'm not sure if this is what the interviewer had in mind. If it is - they're really looking for someone well experienced with graph algorithms.