Search code examples
mysqlstatisticsmedian

Calculating the Median with Mysql


I'm having trouble with calculating the median of a list of values, not the average.

I found this article Simple way to calculate median with MySQL

It has a reference to the following query which I don't understand properly.

SELECT x.val from data x, data y
GROUP BY x.val
HAVING SUM(SIGN(1-SIGN(y.val-x.val))) = (COUNT(*)+1)/2

If I have a time column and I want to calculate the median value, what do the x and y columns refer to?


Solution

  • val is your time column, x and y are two references to the data table (you can write data AS x, data AS y).

    EDIT: To avoid computing your sums twice, you can store the intermediate results.

    CREATE TEMPORARY TABLE average_user_total_time 
          (SELECT SUM(time) AS time_taken 
                FROM scores 
                WHERE created_at >= '2010-10-10' 
                        and created_at <= '2010-11-11' 
                GROUP BY user_id);
    

    Then you can compute median over these values which are in a named table.

    EDIT: Temporary table won't work here. You could try using a regular table with "MEMORY" table type. Or just have your subquery that computes the values for the median twice in your query. Apart from this, I don't see another solution. This doesn't mean there isn't a better way, maybe somebody else will come with an idea.