Search code examples
mysqlsqlmedian

This is a code to calculate the median of values in a table in mysql , I need some expert to explain the sequence in which the commands are executing


 set @ct :=(select count(*) from medi);
 set @ro :=0;
 select  avg(num) as median from (select * from medi order by num)  
 where (select @ro:= @ro+1)      
 between @ct/2.0 AND @ct/2.0+1;

Values in table
+------+
| num |
+------+
| 2 |
| 55 |
| 63 |
| 85 |
| 32 |
| 15 |
| 3 |
| 36 |
| 69 |
+------+
Need help to understand the sequence of execution and the The role of @ro in where clause


Solution

  • Well to begin with this query produces the incorrect result. It produced 32 the median, which is incorrect. The median should be 36

    The median is the value separating the higher half of a data sample, a population, or a probability distribution, from the lower half. In simple terms, it may be thought of as the "middle" value of a data set. For example, in the data set {1, 3, 3, 5, 9, 11, 13}, the median is 5, the fourth number in the sample. The median is a commonly used measure of the properties of a data set in statistics and probability theory.

    Ref: https://en.wikipedia.org/wiki/Median

    The problem with this bit of SQL is that it does not order the values which is required for determining the median.

    This stackoverflow Q&A: Simple way to calculate median with MySQL has lots of different ways to calculate median and with explanations given.

    If you really want to know what the above query does
    Step1: Count the number of rows in the table
    Step2: Create a variable and asign it to zero
    Step3: Use the variable as an number that increments by one for each row to find the halfway mark of the table. Take the average of the numbers that lie next to the halfway mark.

    Now you will realize why it's wrong, it does not impose any ordering on the contents of the table.