Search code examples
pythonbinary-search

Binary search algorithm for 2D array Python


I have been taught the 1D array way of doing binary search:

def exist(target, array):
  lower = -1 
  upper = len(array)
  while not (lower + 1 == upper):
    mid = (lower + upper)//2
    if target== array[mid]: 
      return True
    elif target< array[mid]: 
      upper = mid 
    else: 
      lower = mid 
  return False

However, now I am faced with this 2D array list2 that takes in employee_id and employee_birthyear:

[[2, 1986],
 [4, 1950],
 [6, 1994],
 [9, 2004],
 [12, 1988],
 [13, 1964],
 [16, 1987],
 [18, 1989],
 [19, 1951],
 [20, 1991]]

I want to write a function using the above pure binary search algorithm that takes in a year (as an integer) and list2, and returns a list of employee_id with matching employee_birthyear.

How do i do this?

Here's what i have thought of:

lst2 = [ j for i in list2 for j in i]

def get_IDs_with_birthyear(year, lst2):
    lower = -1 
  upper = len(lst2)
  while not (lower + 1 == upper): 
    mid = (lower + upper)//2
    if year = lst2[mid]:
      return mid 
  return []

Updates: I tried sorting my year and doing the binary search, but when there are multiple id with the same year, I am unable to retrieve all the ids.

ids = [] 
def get_IDs_with_birthyear(year, employee_with_birthyear_list):
  data2 = sorted(employee_with_birthyear_list, key=lambda d: d[1])
  years = [d[1] for d in data2]
  id = [d[0] for d in data2]
  
  lower = -1 
  upper = len(years)
  while not (lower + 1 == upper): 
    mid = (lower + upper)//2
    if year == years[mid]:
      ids.append(id[mid])
      return ids
    elif year < years[mid]:
      upper = mid 
    else: 
      lower = mid
  return False 

Result is supposed to get [101, 201, 1999632, 1999649], but instead i got only [1999632]

result = get_IDs_with_birthyear(1949, ewby)

What am i doing wrong in my function?


Solution

  • The binary search algorithm needs the data to be sorted on the key you are searching on. In this case your 2D list is not sorted by year so you will need a different approach (or you'll need to resort the list on years instead of employee IDs).

    One option would be to build a dictionary with the year as key:

    years = dict()
    for emp,birth in employee_birthyear:
        years.setdefault(birth,[]).append(emp)
    

    Then you can get the list of employee IDs for any given birth year:

    years[1950] # [4]
    

    Note that you get a list of employee IDs because you could have more than one employee born the same year

    Once the dictionary is build (an O(N) operation as opposed to O(NlogN) for a sort), all accesses by year will return in O(1). This avoids the need to change the order of elements in the employee_birthyear list and has lower complexity than a binary search (which is O(logN))