Search code examples
pythonarraysrangebinary-search

Get the number of integers between each given range


The task:

Write a function that receives 3 arrays and returns an array. The first array contains n integers, their values range between 0 and 10^9. "numbers". The second array is a low-range array, which contains the lower end of a range, it contains q integers. "low". The third array is a high-range array, which contains the higher end of a range, it contains q integers. "high".

The function should return an array that contains the number of integers in the first array, that fall in its range, given by the low-range and high-range arrays.

In the returned array, at index i, there should be the number of integers in "numbers" which are bigger or equal to low[i] and smaller or equal to high[i].

Examples:
count_range([12,13,14,15,17],[14],[14]) should return [1]
count_range([12,13,14,15,17],[14,15],[14,18]) should return [1,2]
count_range([12,13,14,15,17],[12],[17]) should return [5]

This is how I solved the question but I feel like there might be a more efficient way to solve seeing as the arrays could be long and the numbers could be really big.

I'd be glad to get some insights or tests that challenge this could to help me think in a better direction.

def binarySearch(data, val):
    highIndex = len(data) - 1
    lowIndex = 0
    while highIndex > lowIndex:
        index = math.ceil((highIndex + lowIndex) / 2)
        sub = data[index]
        if sub > val:
            if highIndex == index:
                return sorted([highIndex, lowIndex])
            highIndex = index
        else:
            if lowIndex == index:
                return sorted([highIndex, lowIndex])
            lowIndex = index
    return sorted([highIndex, lowIndex])


def count_range(numbers, low, high):
    numbers.sort()
    result = []
    range_dict = {}
    for i in range(len(numbers)):
        if numbers[i] not in range_dict:
            range_dict[numbers[i]] = i
    for i in range(len(low)):
        low_r = low[i]
        high_r = high[i]
        if low_r not in range_dict:
            range_dict[low_r] = binarySearch(numbers, low_r)[0]
        low_index = range_dict.get(low_r)
        if high_r not in range_dict:
            range_dict[high_r] = binarySearch(numbers, high_r)[0]
        high_index = range_dict.get(high_r)
        while high_index+1 < len(numbers) and numbers[high_index + 1] == numbers[high_index]:
            high_index += 1
        if low_r in numbers or low_r < numbers[0]:
            low_index -= 1
        result.append(high_index - low_index)
    print(result)
    return result

Solution

  • Using Python's bisect:

    from bisect import bisect_left, bisect_right
    
    def count_range(numbers, low, high):
        numbers.sort()
        return [bisect_right(numbers, hi) - bisect_left(numbers, lo)
                for hi, lo in zip(high, low)]
    

    In my solution, the sorting actually took about 90% of my time, then handling the queries only took about 10% of my time.

    Benchmark with 100,000 numbers and 1000 ranges (Try it online!):

      71.6 ms  count_range_Sara
      24.1 ms  count_range_Kelly
    6303.6 ms  count_range_Nin17
    
      70.5 ms  count_range_Sara
      23.1 ms  count_range_Kelly
    6225.0 ms  count_range_Nin17
    
      64.5 ms  count_range_Sara
      23.1 ms  count_range_Kelly
    6306.7 ms  count_range_Nin17
    

    (Note: Sara's solution I measured is the original slightly buggy one, which I included despite the buggyness as the question is about efficiency. And Nin17's solution that I measured is their original one from their now deleted answer, not the NumPy one.)