Search code examples
language-agnosticmedian

Finding Median WITHOUT Data Structures


(my code is written in Java but the question is agnostic; I'm just looking for an algorithm idea)

So here's the problem: I made a method that simply finds the median of a data set (given in the form of an array). Here's the implementation:

public static double getMedian(int[] numset) {
    ArrayList<Integer> anumset = new ArrayList<Integer>();
    for(int num : numset) {
        anumset.add(num);
    }
    anumset.sort(null);

    if(anumset.size() % 2 == 0) {
        return anumset.get(anumset.size() / 2);
    } else {
        return (anumset.get(anumset.size() / 2)
                   + anumset.get((anumset.size() / 2) + 1)) / 2;
    }
}

A teacher in the school that I go to then challenged me to write a method to find the median again, but without using any data structures. This includes anything that can hold more than one value, so that includes Strings, any forms of arrays, etc. I spent a long while trying to even conceive of an idea, and I was stumped. Any ideas?


Solution

  • The usual algorithm for the task is Hoare's Select algorithm. This is pretty much like a quicksort, except that in quicksort you recursively sort both halves after partitioning, but for select you only do a recursive call in the partition that contains the item of interest.

    For example, let's consider an input like this in which we're going to find the fourth element:

    [ 7, 1, 17, 21, 3, 12, 0, 5 ]

    We'll arbitrarily use the first element (7) as our pivot. We initially split it like (with the pivot marked with a *:

    [ 1, 3, 0, 5, ] *7, [ 17, 21, 12]

    We're looking for the fourth element, and 7 is the fifth element, so we then partition (only) the left side. We'll again use the first element as our pivot, giving (using { and } to mark the part of the input we're now just ignoring).

    [ 0 ] 1 [ 3, 5 ] { 7, 17, 21, 12 }

    1 has ended up as the second element, so we need to partition the items to its right (3 and 5):

    {0, 1} 3 [5] {7, 17, 21, 12}

    Using 3 as the pivot element, we end up with nothing to the left, and 5 to the right. 3 is the third element, so we need to look to its right. That's only one element, so that (5) is our median.

    By ignoring the unused side, this reduces the complexity from O(n log n) for sorting to only O(N) [though I'm abusing the notation a bit--in this case we're dealing with expected behavior, not worst case, as big-O normally does].

    There's also a median of medians algorithm if you want to assure good behavior (at the expense of being somewhat slower on average).

    This gives guaranteed O(N) complexity.