Search code examples
sqlinner-joinderived-table

Inner join and derived table


I have been trying to solve Q9 from SQL Movie-Rating Query Exercises (Find the difference between the average rating of movies released before 1980 and the average rating of movies released after 1980. ). Due to the fact that almost every movie has more than one rating (stars) I need to calculate first average rating of each one. I'm doing this by this code:

SELECT 
     AVG(rating.stars) 
FROM Rating 
        INNER JOIN movie ON rating.mid = movie.mid 
GROUP BY 
     rating.mID 
HAVING 
     year < 1980

In the next step I put above code into the main query that will calculate difference between average rating before 1980 and after 1980. But first I want to see if everything will work well. Let's find out:

SELECT
    before.AVG_before1980
FROM 
    (
        SELECT 
        AVG(rating.stars) as AVG_before1980
        FROM Rating 
            INNER JOIN movie ON rating.mid = movie.mid 
        GROUP BY rating.mID 
        HAVING year < 1980
    ) AS before

Output:

AVG_before1980
     3.0
     2.5
     4.5

Everything works fine so next step is to JOIN subquery which do same but after 1980 year:

SELECT
    before.AVG_before1980, after.AVG_after1980
FROM 
    (
        SELECT 
            AVG(rating.stars) as AVG_before1980
        FROM 
            Rating 
            INNER JOIN 
                    movie ON rating.mid = movie.mid 
        GROUP BY 
            rating.mID 
        HAVING 
            year < 1980
    ) AS before
    INNER JOIN
    (
        SELECT 
            AVG(rating.stars) as AVG_after1980
        FROM 
            Rating 
            INNER JOIN 
                    movie ON rating.mid = movie.mid 
        GROUP BY 
            rating.mID 
        HAVING 
            year > 1980
    ) AS after

Unfortunately output is not as I expected:

AVG_before_1980      AVG_after1980
     2.5                 4.0
     2.5                 3.33333333333333
     2.5                 2.5
     4.0                 4.0
     4.0                 3.33333333333333
     4.0                 2.5
     3.33333333333333    4.0
     3.33333333333333    3.33333333333333

I thought it would be:

AVG_before1980   AVG_after1980
    3.0              2.5
    2.5              4.0
    4.5          3.33333333333333

Question is simple: What I have to do to get above output? It would be nice if someone could explain me more what am I doing wrong. And please feel free to improve my code.

All needed tabels: https://lagunita.stanford.edu/c4x/DB/SQL/asset/moviedata.html


Solution

  • You can use conditional aggregation to get the average of ratings on movies before and after 1980:

    SELECT AVG(CASE WHEN m.year < 1980 THEN r.stars END) as avg_pre1980,
           AVG(CASE WHEN m.year > 1980 THEN r.stars END) as avg_post1980
    FROM Rating r INNER JOIN
         movie m
         ON r.mid = m.mid ;
    

    To get the average for movies, you need to aggregate first by movie and then overall:

    SELECT AVG(CASE WHEN m.year < 1980 THEN avg_stars END) as avg_pre1980,
           AVG(CASE WHEN m.year > 1980 THEN avg_stars END) as avg_post1980
    FROM (SELECT m.id, year, AVG(r.stars) as avg_stars
          FROM Rating r INNER JOIN
               movie m
               ON r.mid = m.mid
          GROUP BY m.id, m.year
         ) my
    

    You don't specify your database, but some use integer arithmetic for division and averages, so you should convert to a non-integer number to get a more accurate average: AVG(r.stars * 1.0).