I have been working on this exercise on Codecademy for several days and still couldn't understand the logic behind it.Below is the exercise and code:
It would be interesting to order flights by giving them a sequence number based on time, by carrier. For instance, assuming flight_id increments with each additional flight, we could use the following query to view flights by carrier, flight id, and sequence number:
SELECT carrier, id,
(SELECT COUNT(*)
FROM flights f
WHERE f.id < flights.id
AND f.carrier=flights.carrier) + 1 AS flight_sequence_number
FROM flights;
I can understand that f
is a virtual form of table flights
, but what does f.id < flights.id
do? Does it mean that SQL compares each row in f
with each row in flights
like
compare MQ 17107 with MQ 7869,
compare MQ 17107 with MQ 2205,
compare MQ 17107 with MQ 14979
……
compare MQ 7869 with MQ 2205,
compare MQ 7869 with MQ 14979
……
Beside, what does this COUNT(*)
really count? And why the+1
?
This is the result image: query result
Any help will be appreciated. Thanks.
The query selects records from table flights. For each record it selects carrier, id and the flight_sequence_number.
So for every record in flights the subquery gets executed. It reads the table flights again, but only takes records with the same carrier and a flight ID smaller than the one of the main record. In order to talk about the main query record and the sub query record, you need one or two table aliases, for otherwise both records would be called flight as in flight.id, and it would not be clear which one you are talking of. So the table in the inner query gets the alias f. Now you can compare f.id < flights.id and f.carrier = flights.carrier.
COUNT(*) counts hence all records lower than a main record's flight ID for its carrier and thus numbers the carrier's rows. For the smallest ID you find no smaller IDs so the count is 0 and you add one thus getting row number 1. For the second smallest you find one record with a smaller ID, hence you get count 1, add 1 and get row number 2 and so on.
The results will look better, when you add an order by clause ORDER BY carrier, id
, so you show the results in the order used.
As has been mentioned, several modern DBMS offer analytic functions, such as ROW_NUMBER
and the query becomes much simpler:
select
carrier,
id,
row_number() over (partition by carrier order by id) as flight_sequence_number
from flights
order by carrier, id;