python set time-complexity hash-collision

difference in time complexity using set's in method and list indexing(python)

you can see this problem in english by clicking button in top right corner(right after star shape)

problem Explanation: A single-player game is played on a rectangular board divided in R rows and C columns. There is a single uppercase letter (A-Z) written in every position in the board.

Before the begging of the game there is a figure in the upper-left corner of the board (first row, first column). In every move, a player can move the figure to the one of the adjacent positions (up, down, left or right). Only constraint is that a figure cannot visit a position marked with the same letter twice. The goal of the game is to play as many moves as possible.

Write a program that will calculate the maximal number of positions in the board the figure can visit in a single game.

I solve this problem by DFS, checking if i visited some alphabet by 26 size list, and updating global variable "ans" when depth of dfs exceeds "ans"

import sys

input = sys.stdin.readline

r, c= map(int, input().split())

graph = []
for _ in range(r):
    graph.append(list(input().strip()))

alphas = [0] * 26
visited = [[0] * c for _ in range(r)]

# 동 서 남 북
dr = [0, 0, 1, -1]
dc = [1, -1, 0, 0]

ans = 0
real_visited = [[0] * c for _ in range(r)]

def dfs(row, col, cnt):
    global ans

    if cnt > ans:
        ans = cnt

    visited[row][col] = 1
    alphas[ord(graph[row][col]) - ord('A')] = True

    for i in range(4):
        newR = row + dr[i]
        newC = col + dc[i]

        if 0 <= newR < r and 0 <= newC < c and not alphas[ord(graph[newR][newC]) - ord('A')] and not visited[newR][newC]:
            dfs(newR, newC, cnt + 1)


    alphas[ord(graph[row][col]) - ord('A')] = False
    visited[row][col] = 0    

dfs(0, 0, 1)

this code don't have any time complexity issue. but when I change list "alphas" into set, and check if i encountered some alphabet by "sth in set" time limit exceed occurs.

like this..

alphas = set()
.
.
alphas.add(graph[row][col])
.
.
graph[newR][newC] not in alphas

I thought the time complexity of both in operation(set), and indexing in list are O(1). I asked this to chatGPT, and gpt told me hash collision could happen, so that finding elements in set could consume more than O(1). I understand this mechanism to some extent.

However, there are only 26 characters, So I personally think there would be no something like hash collision(cause inputs are so small). So I wonder what really caused this diffrence in Time complexity. Is it really because of Hash collison? even with 26 inputs??

Thanks for reading.

Solution

Suppose that I write a simple algorithm to print a list.

for x in things:
    print(x)

Assuming that the print takes a fixed time, this takes time O(len(things)).

But I decide that the person running my program needs to learn some patience. So I rewrite it.

import time

for x in things:
    print(x)
    time.sleep(1)

This second program is dramatically slower because the program spends almost all of its time sleeping. But I only added O(len(things)) operations that each take fixed time, so its complexity remains O(len(things)).

This is an example of the maxim that big-O is not about how fast it is, it is about how well it will scale. O(1) operations are not created equal. If you use slow ones, your program will be slow.

And so it is with set membership and dictionary lookups. They are O(1) on average. But they are also much more complex than index lookups into an array. And therefore they are much slower. When you switched to using one you slowed your program down, even though its average performance complexity remained the same.