I have been trying to solve Q9 from SQL Movie-Rating Query Exercises (Find the difference between the average rating of movies released before 1980 and the average rating of movies released after 1980. ). Due to the fact that almost every movie has more than one rating (stars) I need to calculate first average rating of each one. I'm doing this by this code:
SELECT
AVG(rating.stars)
FROM Rating
INNER JOIN movie ON rating.mid = movie.mid
GROUP BY
rating.mID
HAVING
year < 1980
In the next step I put above code into the main query that will calculate difference between average rating before 1980 and after 1980. But first I want to see if everything will work well. Let's find out:
SELECT
before.AVG_before1980
FROM
(
SELECT
AVG(rating.stars) as AVG_before1980
FROM Rating
INNER JOIN movie ON rating.mid = movie.mid
GROUP BY rating.mID
HAVING year < 1980
) AS before
Output:
AVG_before1980
3.0
2.5
4.5
Everything works fine so next step is to JOIN subquery which do same but after 1980 year:
SELECT
before.AVG_before1980, after.AVG_after1980
FROM
(
SELECT
AVG(rating.stars) as AVG_before1980
FROM
Rating
INNER JOIN
movie ON rating.mid = movie.mid
GROUP BY
rating.mID
HAVING
year < 1980
) AS before
INNER JOIN
(
SELECT
AVG(rating.stars) as AVG_after1980
FROM
Rating
INNER JOIN
movie ON rating.mid = movie.mid
GROUP BY
rating.mID
HAVING
year > 1980
) AS after
Unfortunately output is not as I expected:
AVG_before_1980 AVG_after1980
2.5 4.0
2.5 3.33333333333333
2.5 2.5
4.0 4.0
4.0 3.33333333333333
4.0 2.5
3.33333333333333 4.0
3.33333333333333 3.33333333333333
I thought it would be:
AVG_before1980 AVG_after1980
3.0 2.5
2.5 4.0
4.5 3.33333333333333
Question is simple: What I have to do to get above output? It would be nice if someone could explain me more what am I doing wrong. And please feel free to improve my code.
All needed tabels: https://lagunita.stanford.edu/c4x/DB/SQL/asset/moviedata.html
You can use conditional aggregation to get the average of ratings on movies before and after 1980:
SELECT AVG(CASE WHEN m.year < 1980 THEN r.stars END) as avg_pre1980,
AVG(CASE WHEN m.year > 1980 THEN r.stars END) as avg_post1980
FROM Rating r INNER JOIN
movie m
ON r.mid = m.mid ;
To get the average for movies, you need to aggregate first by movie and then overall:
SELECT AVG(CASE WHEN m.year < 1980 THEN avg_stars END) as avg_pre1980,
AVG(CASE WHEN m.year > 1980 THEN avg_stars END) as avg_post1980
FROM (SELECT m.id, year, AVG(r.stars) as avg_stars
FROM Rating r INNER JOIN
movie m
ON r.mid = m.mid
GROUP BY m.id, m.year
) my
You don't specify your database, but some use integer arithmetic for division and averages, so you should convert to a non-integer number to get a more accurate average: AVG(r.stars * 1.0)
.