I am trying to write a query (PostgreSQL) to get "Movies with highest number of awards in year 2012."
I have following tables:
CREATE TABLE Award(
ID_AWARD bigserial CONSTRAINT Award_pk PRIMARY KEY,
award_name VARCHAR(90),
category VARCHAR(90),
award_year integer,
CONSTRAINT award_unique UNIQUE (award_name, category, award_year));
CREATE TABLE AwardWinner(
ID_AWARD integer,
ID_ACTOR integer,
ID_MOVIE integer,
CONSTRAINT AwardWinner_pk PRIMARY KEY (ID_AWARD));
And I written following query, which gives correct results, but there's is quite a lot of code duplication I think.
select * from
(select id_movie, count(id_movie) as awards
from Award natural join awardwinner
where award_year = 2012 group by id_movie) as SUB
where awards = (select max(count) from
(select id_movie, count(id_movie)
from Award natural join awardwinner
where award_year = 2012 group by id_movie) as SUB2);
So SUB
and SUB2
are exactly the same subquery. Is there a better way to do this?
SELECT id_movie, awards
FROM (
SELECT aw.id_movie, count(*) AS awards
,rank() OVER (ORDER BY count(aw.id_movie) DESC) AS rnk
FROM award a
JOIN awardwinner aw USING (id_award)
WHERE a.award_year = 2012
GROUP BY aw.id_movie
) sub
WHERE rnk = 1;
This should be simpler and faster than suggestions so far. Test with EXPLAIN ANALYZE
.
There are cases where CTEs are instrumental to avoid code duplication. But not in this time: a subquery does the job just fine and is usually faster.
You can run a window function OVER an aggregate function on the same query level. That's why this works:
rank() OVER (ORDER BY count(aw.id_movie) DESC) AS rnk
I'd suggest to use explicit column names in the JOIN condition instead of NATURAL JOIN
, which is prone to breakage if you later change / add columns to the underlying tables.
The JOIN condition with USING
is almost as short, but doesn't break as easily.
Since id_movie
cannot be NULL (ruled out by the JOIN condition and also part of the pk) it is shorter ans slightly faster to use count(*)
instead. Same result.
Shorter and faster, yet, if you only need one winner:
SELECT aw.id_movie, count(*) AS awards
FROM award a
JOIN awardwinner aw USING (id_award)
WHERE a.award_year = 2012
GROUP BY 1
ORDER BY 2 DESC, 1 -- as tie breaker
LIMIT 1
Using positional references (1
, 2
) here as shorthand.
I added id_movie
to ORDER BY
as tie breaker in case multiple movies should qualify for the win.