I have been taught the 1D array way of doing binary search:
def exist(target, array):
lower = -1
upper = len(array)
while not (lower + 1 == upper):
mid = (lower + upper)//2
if target== array[mid]:
return True
elif target< array[mid]:
upper = mid
else:
lower = mid
return False
However, now I am faced with this 2D array list2
that takes in employee_id and employee_birthyear:
[[2, 1986],
[4, 1950],
[6, 1994],
[9, 2004],
[12, 1988],
[13, 1964],
[16, 1987],
[18, 1989],
[19, 1951],
[20, 1991]]
I want to write a function using the above pure binary search algorithm that takes in a year (as an integer) and list2
, and returns a list of employee_id with matching employee_birthyear.
How do i do this?
Here's what i have thought of:
lst2 = [ j for i in list2 for j in i]
def get_IDs_with_birthyear(year, lst2):
lower = -1
upper = len(lst2)
while not (lower + 1 == upper):
mid = (lower + upper)//2
if year = lst2[mid]:
return mid
return []
Updates:
I tried sorting my year
and doing the binary search, but when there are multiple id
with the same year, I am unable to retrieve all the ids.
ids = []
def get_IDs_with_birthyear(year, employee_with_birthyear_list):
data2 = sorted(employee_with_birthyear_list, key=lambda d: d[1])
years = [d[1] for d in data2]
id = [d[0] for d in data2]
lower = -1
upper = len(years)
while not (lower + 1 == upper):
mid = (lower + upper)//2
if year == years[mid]:
ids.append(id[mid])
return ids
elif year < years[mid]:
upper = mid
else:
lower = mid
return False
Result is supposed to get [101, 201, 1999632, 1999649], but instead i got only [1999632]
result = get_IDs_with_birthyear(1949, ewby)
What am i doing wrong in my function?
The binary search algorithm needs the data to be sorted on the key you are searching on. In this case your 2D list is not sorted by year so you will need a different approach (or you'll need to resort the list on years instead of employee IDs).
One option would be to build a dictionary with the year as key:
years = dict()
for emp,birth in employee_birthyear:
years.setdefault(birth,[]).append(emp)
Then you can get the list of employee IDs for any given birth year:
years[1950] # [4]
Note that you get a list of employee IDs because you could have more than one employee born the same year
Once the dictionary is build (an O(N) operation as opposed to O(NlogN) for a sort), all accesses by year will return in O(1). This avoids the need to change the order of elements in the employee_birthyear list and has lower complexity than a binary search (which is O(logN))