Search code examples
sqlgroup-byinner-joingreatest-n-per-group

SQL query only working with one column and stops working as soon as I add the second column


I've being trying to make this query work but I haven't managed to make it do so, here is what I've managed to do so far:

select c.EID
from certified c
    inner join aircrafts a ON c.AID = a.AID
group by c.EID 
HAVING count(c.EID)  >= 3 and MAX(a.CRUISINGRANGE);

RESULTS:

EID
'141582651'

'142519864'

'269734834'

'390487451'

'552455318'

'556784565'

'567354612'

'573284895'

As you can see, I have an inner join with the aircraft table in order to get the max cruising range, from certified I count and return the pilots that can pilot more than 3 planes.

Now my issue is, I can't seem to be able to add the a.AID on my select in order to display it, here's what I've tried:

select c.EID, a.AID
from certified c
    inner join aircrafts a ON c.AID = a.AID
group by c.EID, a.AID
HAVING count(c.EID)  >= 3 and MAX(a.CRUISINGRANGE);

RESULTS:

EID AID

0rows

EXPECTED RESULT:

EID          AID

'141582651'  '6475'

'142519864'  '8430'

'269734834'   '8430'

'390487451'   '7120'

'552455318'   '3383'

'556784565'   '7120'

'567354612'   '8430'

'573284895'   '7120'

but I get 0 rows while trying to do so.

DATA SAMPLE: 
Certified: 
EID        AID
142519864   1
269734834   1
550156548   1
567354612   1
11564812    2
141582651   2
142519864   2
242518965   2
269734834   2
552455318   2
556784565   2


AIRCRAFTS
AID    ANAME         CRUISING RANGE
1   Boeing 747-400  8430
2   Boeing 737-800  3383
3   Airbus A340-300 7120
4   British Aerospace Jetstream 41  1502
5   Embraer ERJ-145 1530
6   SAAB 340    2128
7   Piper Archer III    520
8   Tupolev 154 4103
9   Lockheed L1011  6900
10  Boeing 757-300  4010
11  Boeing 777-300  6441
12  Boeing 767-400ER    6475
13  Airbus A320 2605
14  Airbus A319 1805
15  Boeing 727  1504
16  Schwitzer 2-33  30
123 Airbus  1000
302 Boeing  5000
306 Jet01   5000
378 Airbus380   8000

Solution

  • Your question is really hard to follow; you don't seem to have actually stated what you want in a consistent and coherent manner. From reading the comments it seems that you would like to know:

    The furthest that each pilot (who is rated for 3 or more aircraft) may fly.

    You are thus expecting a list of pilots EIDs, and the max cruising range (from the aircraft table) across all the planes that pilot may fly. The pilot must be able to fly at least 3 planes.

    Steps:

    Build the list of pilots and the distances the aircraft they are rated for can fly:

    SELECT c.EID, a.cruising_range
    FROM certified c INNER JOIN aircrafts a ON c.AID = a.AID
    

    Improve the query to count the ratings and the max cruising range per pilot (A pilot can fly a plane, a plane has a range, therefore when a pilot is flying a plane, a pilot has a range):

    SELECT c.EID, COUNT(*) as count_ratings, MAX(a.cruising_range) max_range
    FROM certified c INNER JOIN aircrafts a ON c.AID = a.AID
    GROUP BY c.EID --per pilot
    

    Improve the query to select only the pilots who can fly at least 3 aircraft:

    SELECT c.EID, max_range FROM
    (
      SELECT c.EID, COUNT(*) as count_ratings, MAX(a.cruising_range) max_range
      FROM certified c INNER JOIN aircrafts a ON c.AID = a.AID
      GROUP BY c.EID --per pilot
    ) pilot_ranges
    WHERE
      pilot_ranges.count_ratings >= 3
    

    I could have used HAVING:

    SELECT c.EID, MAX(a.cruising_range) max_range
    FROM certified c INNER JOIN aircrafts a ON c.AID = a.AID
    GROUP BY c.EID 
    HAVING COUNT(*) >= 3
    

    But sometimes there's a benefit in laying out the query visually in a set-of-steps-rather-than-trying-to-do-it-all-at-once, and using subqueries to mentally form a list of data, then change it, filter it, add to it.. all in stages.

    The query optimizer will likely rewrite these queries internally so they end up executing identically anyway, so the benefit is for the human in understanding and future maintenance

    Edit: Here's a query that lists the EID and AID for every pilot-aircraft combination, where the pilot is capable of flying 3+ planes:

    SELECT c.EID, a.AID FROM
    (
      SELECT c.EID
      FROM certified c
      GROUP BY c.EID 
      HAVING count(*) >= 3
    ) find_pilots_can_fly_atleast_three
    INNER JOIN certified c ON c.EID = find_pilots_can_fly_atleast_three.EID
    INNER JOIN aircrafts a ON c.AID = a.AID
    

    First you generate a list of pilots that are interesting - that's the subuery find_pilots_can_fly_atleast_three, then we join that data back to certified and aircrafts. We end up with a list of pilots who can fly 3 planes, and we get details about the planes

    You might have been trying to do too much in one go. When you group, you lose certain bits of data. In grouping up the pilots in order to find those who can fly 3+ planes, we MUST lose the aircraft ID, and keep only the pilot ID. If you try and keep the aircraft ID (because you want it), you end up with groups that only have a count of 1, because each pilot-aircraft combination is unique. So, we lose the aircraft id into an aggregation (count) and keep the pilot id. This helps us find out how many planes each pilot can fly. To find out which planes those pilots can fly, we MUST then join that pilot id from the "find 3+" query back to the list of pilot-aircrafts (certified table) to expand it back to a list of pilots-with-aircraft

    Always remember that you're allowed to join a table to itself, and indeed it's vital in cases like this. If you had a table full of addresses with dates when you lived at that address, to find the latest one, you'd have a query that got your_id, max(lived_there_until_date but you can't keep any of the other address data - you want the date of the latest record.. And then if you DO want the rest of the data from that latest row, you join the query you just did back onto addresses table to retrieve the entire row with the latest lived_there_until date