So for an assignment at school, we had to extract a count from a database. The question was as follows,
--19) How many airports in a timezone name containing 'Europe' are used as a source airport (source_airport_id) for a route that has aircraft that have a wake of 'M' or 'L'
This was the code I came up with,
SELECT count(DISTINCT airports.id) FROM airports WHERE timezone_name LIKE '%Europe%' AND id IN
(SELECT source_airport_id FROM routes WHERE id IN
(SELECT id FROM route_aircrafts WHERE aircraft_id IN
(SELECT id FROM aircrafts WHERE wake_size IN ('M', 'L'))));
it returned 544, while the professors answer returned 566.
SELECT count (DISTINCT airports.id)
FROM airports, routes, route_aircrafts, aircrafts
WHERE airports.id = routes.source_airport_id
AND routes.id = route_aircrafts.route_id
AND aircrafts.id = route_aircrafts.aircraft_id
AND airports.timezone_name LIKE'%Europe%'
AND aircrafts.wake_size IN ('M', 'L'); --566
To me, those two should be doing the same thing and I can't understand why the answers are different.
To get the same answer in your query you need:
SELECT count(DISTINCT airports.id) FROM airports WHERE timezone_name LIKE '%Europe%' AND id IN
(SELECT source_airport_id FROM routes WHERE id IN
(SELECT route_id FROM route_aircrafts WHERE aircraft_id IN
(SELECT id FROM aircrafts WHERE wake_size IN ('M', 'L'))));
You'd used the primary ID field rather than the foreign key route_id. You were getting an approximately similar result because there must be a significant overlap in the values.