Search code examples
pythonarraysrangebinary-search

Get the number of integers between each given range - without external packages


The task:

Write a function that receives 3 lists and returns an array. The first list contains n integers, their values range between 0 and 10^9. "numbers".
The second list is a low-range list, which contains the lower end of a range, it contains q integers. "low".
The third list is a high-range list, which contains the higher end of a range, it contains q integers. "high".

The function should return a list that contains the number of integers in the first list, that fall in its range, given by the low-range and high-range lists.

In the returned list, at index i, there should be the number of integers in "numbers" which are bigger or equal to low[i] and smaller or equal to high[i].
You can only import math, no other imports are allowed
the list may not be sorted

Examples:
count_range([12,13,14,15,17],[14],[14]) should return [1]
count_range([12,13,14,15,17],[14,15],[14,18]) should return [1,2]
count_range([12,13,14,15,17],[12],[17]) should return [5]

This is my solution but it's not efficient enough, I need ways to optimize it or solve it differently without having to import any external packages.

def binarySearch(data, val):
    highIndex = len(data) - 1
    lowIndex = 0
    while highIndex > lowIndex:
        index = math.ceil((highIndex + lowIndex) / 2)
        sub = data[index]
        if sub > val:
            if highIndex == index:
                return sorted([highIndex, lowIndex])
            highIndex = index
        else:
            if lowIndex == index:
                return sorted([highIndex, lowIndex])
            lowIndex = index
    return sorted([highIndex, lowIndex])


def count_range(numbers, low, high):
    numbers.sort()
    result = []
    low_range_dict = {}
    high_range_dict = {}
    for i in range(len(numbers)):
        if numbers[i] not in low_range_dict:
            low_range_dict[numbers[i]] = i
        high_range_dict[numbers[i]] = i
    for i in range(len(low)):
        low_r = low[i]
        high_r = high[i]
        if low_r not in low_range_dict:
            low_range_dict[low_r] = binarySearch(numbers, low_r)[0]
            high_range_dict[low_r] = low_range_dict[low_r]
        low_index = low_range_dict.get(low_r)
        if high_r not in high_range_dict:
            high_range_dict[high_r] = binarySearch(numbers, high_r)[0]
            low_range_dict[high_r] = high_range_dict[high_r]
        high_index = high_range_dict.get(high_r)
        if low_r in numbers or low_r < numbers[0]:
            low_index -= 1
        result.append(high_index - low_index)
    return result

Solution

  • If we could use any module from the standard library, we could do write a very simple solution.

    from bisect import bisect_left
    from functools import lru_cache, partial
    
    def count_range(numbers, lows, highs):
        index = lru_cache()(partial(bisect_left, sorted(numbers)))
        return [index(hi + 1) - index(lo) for (lo, hi) in zip(lows, highs)]
    

    But we can write our own (simplified) equivalent of partial, lru_cache and bisect_left, so the imports are not needed.

    It is less complicated than your original code, and should probably run faster, but I don't know how big the difference is.

    We'll use a simpler bisect function for the binary search. And we don't need two different memoization dictionaries for high and low range.

    # This bisect is based on the reference implementation in the standard library.
    # in cpython this is actually implemented in C, and is faster.
    def bisect_left(a, x):
        """Return the index where to insert item x in list a, assuming a is sorted."""
        lo, hi = 0, len(a)
        while lo < hi:
            mid = (lo + hi) // 2
            if a[mid] < x:
                lo = mid + 1
            else:
                hi = mid
        return lo
    
    
    def count_range(numbers, lows, highs):
        numbers.sort()
        # instead of both low_range_dict and high_range_dict
        # we only need a single memoization dictionary. 
        # We could also use @functools.cache from the standard library
        memo = {}
        def index(val):
            """Memoized bisect"""
            if not val in memo:
                memo[val] = bisect_left(numbers, val)
            return memo[val]
    
        return [index(hi + 1) - index(lo) for (lo, hi) in zip(lows, highs)]