Search code examples
mysqlsqlaveragewindow-functionsmedian

MySQL: Calculating Median of Values grouped by a Column


I have the following table:

+------------+-------+
| SchoolName | Marks |
+------------+-------+
| A          |    71 |
| A          |    71 |
| A          |    71 |
| B          |   254 |
| B          |   135 |
| B          |   453 |
| B          |   153 |
| C          |   453 |
| C          |   344 |
| C          |   223 |
| B          |   453 |
| D          |   300 |
| D          |   167 |
+------------+-------+

And here is the average of marks grouped by school names:

+------------+------------+
| SchoolName | avg(Marks) |
+------------+------------+
| A          |    71.0000 |
| B          |   289.6000 |
| C          |   340.0000 |
| D          |   233.5000 |
+------------+------------+

https://www.db-fiddle.com/f/5t7N3Vx8FSQmwUJgKLqjfK/9

However rather than average, I want to calculate median of the marks grouped by school names.

I am using,

SELECT AVG(dd.Marks) as median_val
FROM (
SELECT d.Marks, @rownum:=@rownum+1 as `row_number`, @total_rows:=@rownum
  FROM tablename d, (SELECT @rownum:=0) r
  WHERE d.Marks is NOT NULL
  ORDER BY d.Marks
) as dd
WHERE dd.row_number IN ( FLOOR((@total_rows+1)/2), FLOOR((@total_rows+2)/2) );

to calculate the average of entire Marks column, but I don't know how to do it for each school separately.


Solution

  • Your query computes row numbers using user variables, which makes it more complicated to handle partitions. Since you are using MySQL 8.0, I would suggest using window functions instead.

    This should get you close to what you expect:

    select 
        SchoolName, 
        avg(Marks) as median_val
    from (
    select 
        SchoolName,
        Marks, 
        row_number() over(partition by SchoolName order by Marks) rn,
        count(*) over(partition by SchoolName) cnt
      from tablename
    ) as dd
    where rn in ( FLOOR((cnt + 1) / 2), FLOOR( (cnt + 2) / 2) )
    group by SchoolName
    

    The arithmetic stays the same, but we are using window functions in groups of records having the same SchoolName (instead of a global partition in your initial query). Then, the outer query filters and aggregates by SchoolName.

    In your DB Fiddlde, this returns:

    | SchoolName | median_val |
    | ---------- | ---------- |
    | A          | 71         |
    | B          | 254        |
    | C          | 344        |
    | D          | 233.5      |