Search code examples
algorithmsortingsearchbinary-search

Find ranges of grouped numbers (log n)


Assume an array of grouped numbers:

arr = [1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 8, 8, 8, 8, 8, 4, 4, 4, 4]

I want to find the range for each number, assuming that all numbers are grouped together in the input array. The output should be:

1: (0,5)
2: (6,10)
3: (11,14)
8: (15,19)
4: (20,23) 

I applied an alteration of binary search, but the output I'm getting is erronous.

I'm looking for a solution with log(n) Time complexity

This is a simplified version of a problem I'm dealing with building an ETL pipeline where visiting an element requires downloading a 100gb file.


Solution

  • I'm looking for a solution with log(n) Time complexity

    That's not possible in the general case; the best you can do is O(r + log n) time, or (equivalently) O(MAX(r, log n)) time, where r is the number of ranges. (This is because you need to build the list of r ranges in order to return it, and you can't do that in less than Θ(r) time.)

    I used the idea of binary search as in dividing the interval in half, and dividing a sub interval only if the midpoint is different than its endpoint.

    That's roughly the right approach, but the devil is in the details.

    What I'd suggest is doing r separate binary searches — one for each range, from left to right — but using a stack to keep track of values that you've previously fetched but are further right than the current range. You can use that stack to quickly find initial 'lo' and 'hi' values for the binary search. That way, you can achieve the asymptotically optimal O(r + log n) worst-case time.

    Here's a JavaScript implementation of that concept:

    function doIt(arr) {
      const stack = [];
      if (arr.length > 0) {
        stack.push({ pos: arr.length - 1, val: arr[arr.length - 1] });
      }
      stack.push({ pos: 0, val: arr[0] });
    
      const result = [];
    
      while (stack.length > 0) {
        const start = stack.pop();
    
        let lo = start.pos;
        while (stack.length > 0 && stack.at(-1).val === start.val) {
          lo = stack.pop().pos;
        }
    
        if (stack.length > 0) {
          while (true) {
            const hi = stack.at(-1).pos;
    
            if (lo + 1 === hi) {
              break;
            }
    
            const midPos = Math.trunc(lo + (hi - lo) / 2);
            const midVal = arr[midPos];
            if (midVal === start.val) {
              lo = midPos;
            } else {
              stack.push({ pos: midPos, val: midVal });
            }
          }
        }
    
        result.push({ start: start.pos, end: lo, val: start.val });
      }
    
      return result;
    }
    
    const arr = [1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 8, 8, 8, 8, 8, 4, 4, 4, 4];
    
    console.log(JSON.stringify(doIt(arr)));