Search code examples
sqlsql-serversubquerycorrelated-subquery

Calculating Percentage of Various Categories that Meet Conditions with SQL Server


I have a table with information about flights between cities that looks like this:

    origin_city dest_city   time    
    Dothan AL   Atlanta GA    171    
    Dothan AL   Atlanta GA    171    
    Dothan AL   Elsewhere AL    2    
    Dothan AL   Elsewhere AL    2    
    Dothan AL   Elsewhere AL    2    
    Boston MA   New York NY     5    
    Boston MA   City MA         1    
    New York NY Boston MA       5    
    New York NY Boston MA       5    
    New York NY Boston MA       5    
    New York NY Poughkipsie NY  2

I want to find, for each origin city, the percentage of flights that are less than 3 hours long. So the results would be like this:

    Dothan AL    60
    Boston MA    50
    New York NY  25

The code I thought would work looks like this:

     SELECT F.origin_city as origin_city,    
       ((SELECT COUNT(*) FROM Flights as F2
       WHERE F2.actual_time < 3) / (SELECT COUNT(*) FROM Flights as  F3)) * 100
     AS percentage
     FROM Flights as F
     GROUP BY F.origin_city
     ORDER BY percentage;
     GO

When I run it, I get a list of origin cities and a column for percentage, as expected, but the percentage is always 0. I'm still pretty confused about subqueries (as you can see).


Solution

  • I would do this using AVG() as a window function:

    SELECT F.origin_city as origin_city, 
           AVG( CASE WHEN F2.actual_time < 3 THEN 100.0 ELSE 0 END) as percentage
    FROM Flights F
    GROUP BY F.origin_city
    ORDER BY percentage;
    

    This assumes that the time is measured in hours. According to Google Maps, you can walk from Dothan to Atlanta in 68 hours, so the 171 is suspicious.