Search code examples
sqlsql-servermedian

Date’s First Median by SQL


"SQL for Smarties" book shows a sample code for computing Date's First Median.

I put the example here. http://sqlfiddle.com/#!3/c69520/1

---duplicate to ensure even number of rows

CREATE VIEW Temp1
AS SELECT weight FROM Parts
UNION ALL
SELECT weight FROM Parts;

---below this part is what I didn't understand how it works

CREATE VIEW Temp2
AS SELECT weight
FROM Temp1
WHERE 
(SELECT COUNT(*) FROM Parts)
<= (SELECT COUNT(*)
FROM Temp1 AS T1
WHERE T1.weight >= Temp1.weight)
AND (SELECT COUNT(*) FROM Parts)
<= (SELECT COUNT(*)
FROM Temp1 AS T2
WHERE T2.weight <= Temp1.weight);

SELECT AVG(DISTINCT weight) AS median
FROM Temp2;

The result is correct though this solution is expensive in terms of time and storage.

Really eager to know how does this part work?

I tried to see the result of inner SQL statement.

(SELECT COUNT(*)
FROM Temp1 AS T1
WHERE T1.weight >= Temp1.weight)

And got the message

The multi-part identifier "Temp1.weight" could not be bound.

How to understand this SQL?


Solution

  • CREATE VIEW Temp2
    AS SELECT weight
    FROM Temp1
    WHERE 
        (SELECT COUNT(*) FROM Parts) <= (SELECT COUNT(*)
                                           FROM Temp1 AS T1
                                           WHERE T1.weight >= Temp1.weight
                                         )
    AND 
        (SELECT COUNT(*) FROM Parts) <= (SELECT COUNT(*)
                                        FROM Temp1 AS T2
                                        WHERE T2.weight <= Temp1.weight
                                     );
    

    I suppose you mean in the above code. I did som reformat of it make it a bit clearer (at least for me). In the subquery you override the name Temp1 to T1. Meaning in the subquery when you say t1.weight you are refereing to the subquerys weight column. But Temp1.weight will be the parent querys Temp1 table. What it means is that for each row in the Temp1 table it will do a new query looking through all rows of Temp1 and check against the current rows value. So 10 rows will be like 200 rows, then of course with the correct indexes and so on the actual rowcount won't need to be as bad.

    The reason you get

    The multi-part identifier "Temp1.weight" could not be bound.

    Is because you are trying to run the inner query without the outer query, hence to server has no idea what you mean by Temp1. It only knows about T1.