Search code examples
javascripttypescriptalgorithmbinary-search

Binary Search for multiple items within a Range (Log time filter)


I have an Array of Log items, already sorted by their timestamp (number of milliseconds since 1970). Now I want to filter them by a specific time range, so I think of Binary Search, however this variant is different than all variants I knew before as I need to find a range within a range. Note that there may be none or multiple items at the value edges.

I came up with this to reduce one range requirement but still don't know how to get to the first/last edge items:

    filterByTime(min: number, max: number): LogItem[] {
        const items = this.items;
        const len = items.length;

        if (min === undefined && max === undefined) {
            return items.slice(0, len - 1);
        }
        min = min || Number.MIN_VALUE;
        max = max || Number.MAX_VALUE;

        const fn = (a: LogItem) => {
            if (a.time < min) {
                return -1;
            } else if (a.time > max) {
                return 1;
            } else {
                return 0;
            }
        };

        const lowerBound = this.binarySearchBound(fn, 0, len, true);
        if (lowerBound == -1) { return []; }

        const upperBound = this.binarySearchBound(fn, 0, len, false);

        return items.slice(lowerBound, upperBound + 1);
    }

    binarySearchBound(compareFn: (a: LogItem) => number, left: number, right: number, isLowerBound: boolean): number {
        const items = this.items;
        if (items.length == 0) {
            return -1;
        }

        if (isLowerBound && compareFn(items[0]) == 0) {
            return 0;
        } else if (!isLowerBound && compareFn(items[items.length - 1]) == 0) {
            return items.length - 1;
        }

        while (left <= right) {
            const mid = (left + right) / 2;
            const item = this.items[mid];

            const compare = compareFn(item);
            if (compare < 0) {
                left = mid + 1;
            } else if (compare > 0) {
                right = mid - 1;
            } else {
                // What to do now?
            }
        }

        return -1;
    }

Worst case scenario, I can just do a linear search from the edge since I can assume there are not that much items at the edge but surely there is a better way I didn't think of but then I may have to iterate through the whole result set if mid falls in the middle of the result set.

EDIT for adding a note: It's possible for min or max is undefined (and could be both, in which case I can just set an if and return the copy of the whole array). Is it better to just substitute it with MIN_VALUE and MAX_VALUE if they are undefined, or is there a better way to handle that case?


Solution

  • I would suggest the following:

    • Write two binary search functions, as the execution time is then not hampered by passing and checking the isLowerBound boolean.

    • Make the returned upperBound mean the next index after the potential last index that belongs to the range. This corresponds with how arguments work with native functions like slice.

    • Don't use -1 as a special index. If coded well, an empty range will come out of the two binary searches anyway and give an empty array as result

    • Make the compare function work with 2 parameters, so you can actually search for either the min or the max value.

    • Yes, I would use MIN_VALUE and MAX_VALUE as defaults and not test for boundary cases. If boundary cases happen often, it might be worth including those checks, but in general be aware that these checks will then be executed for every filter, which may impact the average execution time.

    Here is the suggested implementation with integer data (instead of objects) to keep it simple. In order to have it run in a snippet I also removed the type references:

    function filterByTime(min=Number.MIN_VALUE, max=Number.MAX_VALUE) {
        const fn = (a, b) => a - b; // simplified (should be a.time - b.time)
    
        const lowerBound = this.binarySearchLowerBound(fn, 0, this.items.length, min);
        const upperBound = this.binarySearchUpperBound(fn, lowerBound, this.items.length, max);
    
        return this.items.slice(lowerBound, upperBound);
    }
    
    function binarySearchLowerBound(compareFn, left, right, target) {
        while (left < right) {
            const mid = (left + right) >> 1;
            if (compareFn(this.items[mid], target) < 0) {
                left = mid + 1;
            } else { // Also when equal...
                right = mid;
            }
        }
        return left;
    }
    
    function binarySearchUpperBound(compareFn, left, right, target) {
        while (left < right) {
            const mid = (left + right) >> 1;
            if (compareFn(this.items[mid], target) <= 0) { // Also when equal...
                left = mid + 1;
            } else {
                right = mid;
            }
        }
        return left;
    }
    
    // Demo with simplified data (instead of objects with time property)
    this.items = [1, 2, 2, 2, 3, 4, 4, 5, 5, 5, 6, 7, 8, 8];
    console.log(this.filterByTime(2, 4));
    console.log(this.filterByTime(4, 5));