I am trying to build a dictionary from scratch in Python. I have done most of the work but am stuck on a little problem. First, I will start by saying that I am using the inbuilt Python hash() to get the hash_result of the key (key can be int or str) then the index is formed by the hash_result % capacity of dictionary. If the key is a string of characters, everything works fine. As soon as the key is an integer, my custom dictionary breaks. Sometimes everything works, other times, the key gets hash value 0 (for instance) when adding to the dictionary, but the same key returns hash value 4 (for instance) when searching for the key in the dictionary which returns a KeyError since the key is mapped at index 0 and not 4. I believe that at first, the index is calculated by hash(key) % capacity(4 for instance), but as soon as the capacity gets increased x2, the index that is returned by the function hash(key) % capacity(now 8 because x2) is different which results in the problem. I saw this formula in Wikipedia (hash(key) % capacity). I am interested in learning if this is the problem I am facing or if not, what is actually causing this unwanted behavior and how to tackle it. Here is my code below:
class MyDictionary:
__LOAD_FACTOR_LIMIT = 0.75
__DEFAULT_CAPACITY = 4
def __init__(self):
self.__capacity = self.__DEFAULT_CAPACITY
self.__keys = [[] for i in range(self.__capacity)]
self.__values = [[] for i in range(self.__capacity)]
@property
def keys(self):
return [item for current_list in self.__keys for item in current_list]
@property
def values(self):
return [value for value_list in self.__values for value in value_list]
def __setitem__(self, key, value):
while self.__compute_load_factor() >= self.__LOAD_FACTOR_LIMIT:
self.__extend_dict()
index_hash = self.__hash_function(key)
if self.__is_key_in_dict(index_hash, key):
self.__set_value_to_an_existing_key(index_hash, key, value)
return
self.__set_value_to_a_new_key(index_hash, key, value)
def __getitem__(self, key):
index_hash = self.__hash_function(key)
if self.__is_key_in_dict(index_hash, key):
index_bucket = self.__get_index_bucket(index_hash, key)
return self.__values[index_hash][index_bucket]
raise KeyError('Key is not in dictionary!')
def __str__(self):
key_values = zip(self.keys, self.values)
result = '{' + ", ".join([f"{key}: {value}"
if isinstance(key, int) else f"'{key}': {value}"
for key, value in key_values]) + '}'
return result
def __hash_function(self, key):
index_hash = hash(key) % self.__capacity
return index_hash
def __is_key_in_dict(self, index_hash, key):
if key in self.__keys[index_hash]:
return True
return False
def __get_index_bucket(self, index_hash, key):
index_bucket = self.__keys[index_hash].index(key)
return index_bucket
def __extend_dict(self):
self.__keys += [[] for i in range(self.__capacity)]
self.__values += [[] for i in range(self.__capacity)]
self.__capacity *= 2
def __set_value_to_an_existing_key(self, index_hash, key, value):
index_bucket = self.__get_index_bucket(index_hash, key)
self.__values[index_hash][index_bucket] = value
def __set_value_to_a_new_key(self, index_hash, key, value):
self.__keys[index_hash].append(key)
self.__values[index_hash].append(value)
def __compute_load_factor(self):
k = len(self.__keys)
n = len([bucket for bucket in self.__keys if bucket])
return n / k
def get(self, key, return_value=None):
try:
index_hash = self.__hash_function(key)
index_bucket = self.__get_index_bucket(index_hash, key)
if self.__is_key_in_dict(index_hash, key):
return self.__keys[index_hash][index_bucket]
raise KeyError('Key is not in dictionary!')
except KeyError:
return return_value
def add(self):
pass
def pop(self):
pass
def clear(self):
self.__capacity = self.__DEFAULT_CAPACITY
self.__keys = [[] for i in range(self.__capacity)]
self.__values = [[] for i in range(self.__capacity)]
def items(self):
zipped_key_value = zip(self.keys, self.values)
return [item for item in zipped_key_value]
dictionary = MyDictionary()
dictionary.add()
dictionary[4] = 'hey'
dictionary['2'] = 'cya'
dictionary['4'] = 'welcome'
dictionary['5'] = 'welcome'
dictionary['32'] = 'heya'
dictionary['31'] = 'heya'
dictionary['36'] = 'heya'
dictionary['34'] = 'heya'
print(dictionary[4])
This is because you increase the capacity (stored in the __capacity
attribute) by calling the __extend_dict
method when the load is over a threshold, which makes the indices of the buckets in which the existing values are stored no longer valid, since you always derive the indices by taking the modulo of the capacity.
You should therefore re-insert the existing keys and values at their new indices every time you increase the dict's capacity:
def __extend_dict(self):
self.__capacity *= 2
new_keys = [[] for _ in range(self.__capacity)]
new_values = [[] for _ in range(self.__capacity)]
for keys, values in zip(self.__keys, self.__values):
for key, value in zip(keys, values):
index_hash = self.__hash_function(key)
new_keys[index_hash].append(key)
new_values[index_hash].append(value)
self.__keys = new_keys
self.__values = new_values