Search code examples
algorithmmedian

Minimum number of elements to add to an array, to make its median equal to x


I'm trying to understand an algorithm that, given an array A and an integer x, returns the minimum number of elements that would need to be added to A in order for x to be the median.

The algorithm looks like this:

consider the median to always be the element at position (n-1)/2

n := |A|

m := 0
for i := 1 to n:
    if (A[i] < x) then:
        m++

if (m < n - m) then:
    return (n - 2 * m)
else:
    return (2 * m – n + 1)

I understand that if m is equal to n-m, then the array size is even, and so we can add the wanted median at the (n-1)/2 position, and that will be our new median.

I'm struggling with the case when m is lower then n-m, and we return (n-2m)

or m is greater then n-m, and then return (2*m-n+1)

**why is this true?

I couldn't find proof if you can provide one, or maybe explain in simple words that will be very helpful!**


Solution

  • I suspect that you're misinterpreting what's meant by the median being the element at position (n - 1) / 2. It's not that the median element is the element that is that exact position in the array. Rather, it's the element that would be at that position if you were to sort the array.

    For example, consider the array

    [31, 41, 59, 26, 53, 58, 97]
    

    and imagine we want to make the number 93 the median. Let's think about what we'd have to do in order to make that happen. For starters, for 93 to be the median, we need to add it to the array, as shown here:

    [31, 41, 59, 26, 53, 58, 97, 93]
                                 ^^
    

    Now, what position would 93 be in if we were to sort the array? Let's actually sort the array to find out:

    [26, 31, 41, 53, 58, 59, 93, 97]
                             ^^
    

    Notice that it's at position 6 in the array. That's not the middle of the array, and in order for this element to end up at the middle position, we'd need to insert a bunch more values that come after 93, enough to pad the array so that 93 is in the middle. For example, we could add in some extra 99's, as shown here:

    [26, 31, 41, 53, 58, 59, 93, 97, 99, 99, 99, 99, 99]
                             ^^      ^^  ^^  ^^  ^^  ^^
    

    So in this case, we'd have to add a total of five elements to the array to get 93 to be the median.

    Now, here's a question to consider: why did we get an answer of five? The reason for this is the following. Following the notation from your algorithm, we need to make sure that the number of elements less than 93, which we'll denote by m, is equal to the number of elements that are greater than 93. The number of elements greater than 93 that currently exist in the array is given by n - m, since that's all the elements of the array, minus the ones less than 93.

    So let's suppose that we add 93 itself to the array, along with k elements bigger than 93 to the array to try to place 93 at the median position. From what we've said above, we need to have that the number of elements less than 93 (m) must be one less than the number of elements in the overall array greater than or equal to 93 (n - m, the elements already greater than or equal to 93, plus k, the number of extra elements we added). The -1 term here accounts for the fact that, when we do the division to compute the median position, we round down. That means that we'd have

    m - 1 = n - m + k

    or, that

    k = 2m - n + 1

    And hey, look! It's 2*m - n + 1, which is what we return in the case where mn - m, meaning that the number is greater than or equal to more than half of the array elements.

    On the other hand, let's return to our original array, as shown here:

    [31, 41, 59, 26, 53, 58, 97]
    

    Imagine we want to make 27 the median element. We begin by adding 27, as shown here:

    [31, 41, 59, 26, 53, 58, 97, 23]
                                 ^^
    

    Now, if we sort the elements, we get the following:

    [26, 27, 31, 41, 53, 58, 59, 97]
         ^^
    

    This isn't in the median position, so we're going to have to add some more elements in to pad the array. Let's imagine adding a bunch of smaller elements to pad things out, like this:

    [13, 13, 13, 13, 26, 27, 31, 41, 53, 58, 59, 97]
     ^^  ^^  ^^  ^^      ^^
    

    Looks like we need five elements again this time. Why is that? As before, suppose we're adding k total elements to the array. We'd need to have that

    • the number of elements less than 27 (m), plus the number of elements added in (k), is one less than

    • the number of elements greater than or equal to 27 (n - m), minus one because the newly-added 27 will count as one of the elements greater than or equal to itself.

    In other words, we need

    m + k = n - m,

    or that

    k = n - 2m.

    That gives us the n - 2 * m that makes up the second case.

    So to summarize - the math here arises because we're trying to pad the array to get the side containing the number to add to have the same length as the smaller side, and the math for how to do that depends on whether we're in the first or second half of the array.

    Hope this helps!