Search code examples
pythonalgorithmdata-structuresdepth-first-searchhashset

Using clear() vs copy() in a hashset


I was solving this problem on lintcode

I came up with the following solution first and i failed some test cases

from typing import List

class Solution:
    def wallsAndGates(self, rooms: List[List[int]]) -> None:
        """
        Do not return anything, modify rooms in-place instead.
        """
        ROWS, COLS = len(rooms), len(rooms[0])
        visited = set()
        def dfs(row, col, distance):
            if row not in range(ROWS) or col not in range(COLS):
                return
            if (row, col) in visited:
                return
            if rooms[row][col] == -1:
                return
            if distance > rooms[row][col]:
                return

            visited.add((row, col))
            rooms[row][col] = min(distance, rooms[row][col])

            dfs(row + 1, col, distance + 1)
            dfs(row - 1, col, distance + 1)
            dfs(row, col + 1, distance + 1)
            dfs(row, col - 1, distance + 1)

        for row in range(ROWS):
            for col in range(COLS):
                if rooms[row][col] == 0:
                    visited.clear()  # Clear the visited set before each traversal
                    dfs(row, col, 0)

Then i looked around to see why that was and i saw a suggestion that passing a different copy of hashset will resolve it so i did the following and i passed all test cases.

class Solution:
    def wallsAndGates(self, rooms: List[List[int]]) -> None:
        """
        Do not return anything, modify rooms in-place instead.
        """
        ROWS, COLS = len(rooms), len(rooms[0])

        def dfs(row, col, distance, visited):
            if row not in range(ROWS) or col not in range(COLS):
                return
            if (row, col) in visited:
                return
            if rooms[row][col] == -1:
                return
            if distance > rooms[row][col]:
                return

            visited.add((row, col))
            rooms[row][col] = min(distance, rooms[row][col])

            dfs(row + 1, col, distance + 1, visited.copy())
            dfs(row - 1, col, distance + 1, visited.copy())
            dfs(row, col + 1, distance + 1, visited.copy())
            dfs(row, col - 1, distance + 1, visited.copy())

        for row in range(ROWS):
            for col in range(COLS):
                if rooms[row][col] == 0:
                    dfs(row, col, 0, set())

I am confused to why passing visited.copy() solves the problem. I know that when we pass visited.copy() we are passing a copy and not a reference of the hashset. That makes sense but in my original code i am doing visited.clear() so does that not achieve the same thing? It would be really helpful i guess if someone can explain this to me as i am lost.

I mean i would get it if dfs calls were happening in parallel but they are not happening in parallel so why do i need to pass visited.copy().


Solution

  • That makes sense but in my original code i am doing visited.clear() so does that not achieve the same thing?

    No. You clear it at the start, but the 4 recursive calls to dfs still modify the set.

    I mean i would get it if dfs calls were happening in parallel but they are not happening in parallel so why do i need to pass visited.copy().

    You don't need to pass it if you do bfs instead of dfs, or if you put visited.remove((row, col)) at the end of your dfs function:

    visited.add((row, col))
    rooms[row][col] = min(distance, rooms[row][col])
    
    dfs(row + 1, col, distance + 1)
    dfs(row - 1, col, distance + 1)
    dfs(row, col + 1, distance + 1)
    dfs(row, col - 1, distance + 1)
    
    visited.remove((row, col))  # Cleanup of the global state