Search code examples
arraysalgorithmmedian

Median of two sorted-arrays algorithm for O(log(m+n))


Median of two sorted-arrays is a quite well known problem in leetcode. Real implementation of O(log(min(m,n))) solution is quite straight forward but not of O(log(m•n)). The basic idea is removing small half and decreasing Kth by the number of removed items on each step. As it said

we can safely cut this half, and reduce k by the length of the removed half.

But the implementation seems like it subtracts only either of the half size, not both.

if (aEnd < aStart) {
    return B[k - aStart];
}
if (bEnd < bStart) {
    return A[k - bStart];
}

Both A and B's start index could be increased (that is removed) but only one index is subtracted, B[k - aStart] or A[k - bStart].

How this code has same effect to "safely cut this half, and reduce k by the length of the removed half" each step?

The code, if (aEnd < aStart) { return B[k - aStart]; } just subtract aStart from k. I think it should be something like B[k-(aStart+bStart)] so that it also count items removed from B.


Solution

  • As mentioned, the goal of the algorithm is to find the median of two sorted arrays. The linked solution achieves this by performing Binary Search where the key idea is to recursively narrow down the search space by discarding approximately half of the elements from the arrays at each step, thus achieving a logarithmic time complexity. Since both arrays are sorted, and k is the augmented index of where the median would be had the two arrays been merged, we use binary search logic to find the median without actually merging the arrays.

    The function solve is defined to perform the binary search. It takes parameters k, aStart, aEnd, bStart, and bEnd, which represent the target index and the current search bounds within arrays A and B. The snippet you are referencing,

    if (aEnd < aStart) {
        return B[k - aStart];
    }
    if (bEnd < bStart) {
        return A[k - bStart];
    }
    

    Is the base case, handling the scenario where the segment of one array is exhausted. If we exhaust the segment of array A, we effectively reduce the problem to finding the (k - aStart)-th element in array B as all values in A have been considered.

    • k represents the k-th position we are looking for across the combined sorted arrays.
    • If all elements in A up to aStart have been considered, those elements contribute to k already.
    • The new k-th position relative to B must account for the number of elements already considered from A, hence k - aStart.

    For example, let's assume we have two sorted arrays A and B:

    A = [1, 3, 8, 15]
    B = [7, 11, 18, 19, 21]
    

    We are trying to find the median of the combined sorted arrays. The combined array would be:

    [1, 3, 7, 8, 11, 15, 18, 19, 21]
    

    The index of the median for the combined array is k = floor(9 / 2) = 4, which is corresponds to the value 11.

    The algorithm recursively narrows down the search space. You can trace the variables yourself along the recursive calls, but eventually A's segment is exhausted, aEnd < aStart, meaning all the elements in A have been considered. When the base case is hit, aStart = 3 and aEnd = 2. Therefore, the function returns

    B[k - aStart] = B[4 - 3] = B[1] = 11
    

    which is correct.

    Again, we subtract aStart from k as this narrows down the position k to reflect the remaining elements in B, accounting for the number of elements already considered in A. This adjustment ensures that even when one segment is exhausted, the algorithm still correctly identifies the k-th element in the other array.

    This logic applies similarly when the segment of array B is exhausted. By reducing k by aStart or bStart, we correctly adjust for the elements already considered and continue the search in the remaining array segment.

    Returning B[k - (aStart + bStart)] would incorrectly index B. Remember, k is the index of the median in the merged arrays. When one array is exhausted, the median lies in the other, so k should only be adjusted by the number of elements considered from the exhausted array. Subtracting aStart + bStart from k would incorrectly index into the non-exhausted array.