Search code examples
phpmysqlsqlmedian

PHP function to find the median of a column in MySQL


I have a database, db and in it a table, Table.

It looks somewhat like:

id | val
--------
1  | 45
2  | 35
3  | 23
4  | 49
5  | 67
6  | 12
7  | 0
8  | 87
9  | 46

(This is just an example data set. Actual data set is huge. And I need to work in least time possible.)

I need to find the median of the column val. Actually I need a php function to be used multiple times.

A similar question does exist: Simple way to calculate median with MySQL

I tried a few answers in this question, none of them worked for me. The accepted answer doesn't work since it used to work with an older version of SQL only.

PS: It should also work in the case of many duplicates.


Solution

  • just for fun i thought i try and do it all in MySQL, here's the sqlFiddle

    SELECT 
      CASE 
      WHEN MOD((select count(*) as count from t),2)=1 THEN
          (select val from
               (select @row:=@row+1 as row,val
                from t,(select @row:=0)r
                order by val)t1
           where t1.row = CEIL((select count(*) as count from t)/2)
          )
      ELSE
           ((select val from
               (select @row:=@row+1 as row,val
                from t,(select @row:=0)r
                order by val)t1
             where t1.row = (select count(*) as count from t)/2)+
            (select val from
               (select @row:=@row+1 as row,val
                from t,(select @row:=0)r
                order by val)t1
             where t1.row = ((select count(*) as count from t)/2)+1))/2 
      END AS median
    

    Just replace occurences of t with your table name, don't change t1. Also if the table has no rows, it'll return NULL as median.

    This query can be further reduced to the below (sqlFiddle)

    SELECT @rowCount:=(select count(*) as count from t) AS rowcount,
          (select AVG(val) from
               (select @row:=@row+1 as row,val
                from t,(select @row:=0)r
                order by val)t1
           where t1.row IN (FLOOR((@rowCount+1)/2),
                             CEIL((@rowCount+1)/2)
                            )
          ) as Median
    

    It'll return 2 columns, a rowcount column and a median column. I put the rowcount column there because i didn't want to count from t multiple times like previous query.