Search code examples
arraysrubymultidimensional-arrayyieldmedian

Determine median element of a nested array in Ruby?


I would need a median calculating method in Ruby, which works with nested arrays too, similarly as "uniq" and "sort_by": with those I can define by the block, which of the nested array values should be taken into consideration.

class Array
   def median
      . . .
   end
end

puts [[1,3],[2,5],[3,-4]].median{|z,w| z}

=> [2,5]

puts [[1,3],[2,5],[3,-4]].median{|z,w| w}

=> [1,3]

I am sure I should deal with "yield" somehow but I don't know how to do it exactly.


Solution

  • Since the median needs things sorted, you could just delegate out to sort_by and work on the results of that:

    class Array
      def median(&block)
        block = :itself unless block_given?
    
        sorted = sort_by(&block)
        if length.odd?
          sorted[sorted.length / 2]
        else
          sorted[sorted.length / 2 - 1, 2]
        end
      end
    end
    

    Sample runs:

    [13, 23, 11, 16, 15, 10, 26].median # => 15
    # hyperbole showing the block is used on single elements
    count = 0; [13, 23, 11, 16, 15, 10, 26].median { |a| count += 1 } # => 16
    # even length data set
    # usually you'd average these, but that becomes trickier with nested arrays
    [14, 13, 23, 11, 16, 15, 10, 26].median # =>  [14, 15]
    
    # your examples:
    [[1,3], [2,5], [3,-4]].median { |z,_| z} # => [2, 5]
    [[1,3], [2,5], [3,-4]].median { |_,w| w } # => [1, 3]
    
    # added [6, -6] to your examples:
    [[1,3], [2,5], [3,-4], [6, -6]].median { |z,_| z } # => [[2, 5], [3, -4]]
    [[1,3], [2,5], [3,-4], [6, -6]].median { |_,w| w } # => [[3, -4], [1, 3]]
    

    You don't specify what should happen for even-length arrays. For a math median (If I remember my maths correctly) you would average the two center-most elements, but then comes the question of what the average of 2 different arrays looks like. This takes the simple (for us) approach of returning both the center elements and the caller has to decide how to handle them. (What if it's not another array nested inside, what if it's a list of people and you want the median by last name, for instance)