I have a function that accepts a sorted list of values and computes unique IDs for each input value. For example:
input : [2,2,3,4,5,5,5,6]
output: [0,0,1,2,3,3,3,4]
To do that i have written that function a while ago:
def mapHashedIdsToIds(hashed_ids):
unique_section_ids = []
uniqueHashedIds = []
idMapping = {}
currentId = 0
for element in hashed_ids:
if element not in idMapping:
idMapping[element] = currentId
currentId += 1
uniqueHashedIds.append(element)
for element in hashed_ids:
unique_section_id = idMapping[element]
unique_section_ids.append(unique_section_id)
return(unique_section_ids)
unique_ids = mapHashedIdsToIds(uniqueAbschnittsId)
Now, upon revisiting the code, I aimed to simplify the function with the following attempt:
def Generate_unique_sectionid(hashed_ids):
already_used_hashed_ids = []
unique_section_ids = []
for element in hashed_ids:
if element not in already_used_hashed_ids:
already_used_hashed_ids.append(element)
unique_section_ids.append(len(already_used_hashed_ids)-1)
else:
unique_section_ids.append(len(already_used_hashed_ids)-1)
return unique_section_ids
unique_ids = Generate_unique_sectionid(uniqueAbschnittsId)
I tested both functions using the %timeit command in Jupyter Notebook. Surprisingly, my new function took 2.8 seconds to execute, while the old one only took 25.1 milliseconds. What could be causing this significant difference in computing time? What am I overlooking? I would have expected my new approach to be much faster since it only involves one loop, no dictionary, and no counter variable.
I appreciate your thoughts!
In the first function: You are using a dictionary, lookup and insertion operations are generally O(1) on average in dictionaries. So, the overall time complexity of the function will be O(n) for n elements.
In the second function: You are using a list, 'not in' operation has a linear time complexity O(n), so overall time complexity will be O(n^2)for n elements