Search code examples
algorithmdivide-and-conquer

divide and conquer - find median between two arrays of equal size that contain unique elements?


I am trying to solve a problem exactly like this: nth smallest number among two databases of size n each using divide and conquer

From what I could figure out, the "comparing medians/median of medians" algorithm would give us the solution? My question is whether I am understanding this correctly.

array 1: [7 8 6 5 3]
array 2: [4 10 1 2 9] 

First, find the median for each. we can do this by querying for k=n/2, where n is the size of that array. Being the 3rd smallest element in this case, this gives us 6 for the first array (call this m1), and 4 for the second array (call this m2).

Since m1 > m2, create 2 arrays using the elements that are less than m1 and greater than m2 in that array.

array 1: [5 3]
array 2: [10 9]

^ How would we find the elements that are less than m1 and greater than m2? Would we just take m1 and m2 and compare them with every element in their respective arrays? I know this works when the two arrays are both sorted, but would sorting them first allow us to still get O(log(n)) queries?

I'm assuming we can continue to use our special query (can we?) to get the k=n/2 smallest element (median) for that particular array. If this is the case, we query for k=n/2=1, leaving us with new m1 = 3, m2 = 9.

m1 < m2, so we make 2 arrays using elements that are greater than m1 and less than m2 in that array.

Since there are no elements in array 2 that are less than m2 = 9, we are only left with one array with one element greater than m1 = 3.

[5] <- this is the median

I am also interested in seeing the proof of correctness (that this finds the median) by induction.


Solution

  • The O(n) meidan of median algorithm actually partitions the array so that the elements before it are less than it and after it are greater than it.

    When you recurse with the median of medians as pivot, you are partitioning the array so that it looks like

    (elements less than the median) - p - (elements greater than the median)

    On the correctness, when you first query for k = n/2. You get m1 and m2(m1 > m2). Now you know that there are more than n elements that are less than m1. so elements following it will never be candidates for the median.
    Similarly elements before m2. there are more than n elements ahead of them, so they will never be a candidate for the median. So the median must lie somewhere in the second half of the second array and the first half of the first array.

    But now when you recurse you should keep in mind that you have n/2 elements of the second array counted for, so you need to find the element that would occupy the n/2th position in the sorted union of the two arrays(second half and first half).

    This seems asymptotically optimal since you're always reducing the size of the arrays you are recursing on to half. something like O(n) + O(n/2) + O(n/4) ... = O(n).

    For sorted arrays you can do this is O(logn).