The objective:
Find the percentage of high elevation airports (elevation >= 2000) by state from the airports table.
In the query, alias the percentage column as percentage_high_elevation_airports.
Could someone explain why the following 2 SQL statements give different results:
Correct result:
SELECT state,
100.0 * sum(CASE WHEN elevation >= 2000 THEN 1 ELSE 0 END) / count(*) as percentage_high_elevation_airports
FROM airports
GROUP BY state;
sample result:
MS 0.0
MT 100.0
NC 11.1111111111111
ND 10.0
and wrong result:
select
state,
100.0 * (sum(case when elevation >= 2000 then 1 else 0 end)/count(*)) as percentage_high_elevation_airports
from airports
group by 1;
sample result:
MS 0.0
MT 100.0
NC 0.0
ND 0.0
Only difference is the additional placement of () around the sum.
I would write this as:
SELECT state,
AVG(CASE WHEN elevation >= 2000 THEN 100.0 ELSE 0 END) as percentage_high_elevation_airports
FROM airports
GROUP BY state;
The issue is integer arithmetic. Some databases do an integer division and return an integer. So, 1/2 is 0
rather than 0.5
. Some databases also apply this to avg()
(but even some that do integer division to numeric averages).
I should note that this is database-specific.