sql postgresql aggregate-functions window-functions

Find movies with highest number of awards in certain year - code duplication

I am trying to write a query (PostgreSQL) to get "Movies with highest number of awards in year 2012."

I have following tables:

CREATE TABLE Award(
    ID_AWARD bigserial CONSTRAINT Award_pk PRIMARY KEY,
    award_name VARCHAR(90),
    category VARCHAR(90),
    award_year integer,
    CONSTRAINT award_unique UNIQUE (award_name, category, award_year));

CREATE TABLE AwardWinner(
    ID_AWARD integer,
    ID_ACTOR integer,
    ID_MOVIE integer,
    CONSTRAINT AwardWinner_pk PRIMARY KEY (ID_AWARD));

And I written following query, which gives correct results, but there's is quite a lot of code duplication I think.

select * from 
(select id_movie, count(id_movie) as awards 
from Award natural join awardwinner 
where award_year = 2012 group by id_movie) as SUB
where awards = (select max(count) from 
(select id_movie, count(id_movie) 
from Award natural join awardwinner 
where award_year = 2012 group by id_movie) as SUB2);

So SUB and SUB2 are exactly the same subquery. Is there a better way to do this?

Solution

Get all winning movies

SELECT id_movie, awards
FROM  (
   SELECT aw.id_movie, count(*) AS awards
         ,rank() OVER (ORDER BY count(aw.id_movie) DESC) AS rnk
   FROM   award       a
   JOIN   awardwinner aw USING (id_award)
   WHERE  a.award_year = 2012
   GROUP  BY aw.id_movie
   ) sub
WHERE  rnk = 1;

Major points

This should be simpler and faster than suggestions so far. Test with EXPLAIN ANALYZE.
There are cases where CTEs are instrumental to avoid code duplication. But not in this time: a subquery does the job just fine and is usually faster.
You can run a window function OVER an aggregate function on the same query level. That's why this works:
```
rank() OVER (ORDER BY count(aw.id_movie) DESC) AS rnk
```
I'd suggest to use explicit column names in the JOIN condition instead of NATURAL JOIN, which is prone to breakage if you later change / add columns to the underlying tables.
The JOIN condition with USING is almost as short, but doesn't break as easily.
Since id_movie cannot be NULL (ruled out by the JOIN condition and also part of the pk) it is shorter ans slightly faster to use count(*) instead. Same result.

Just one movie

Shorter and faster, yet, if you only need one winner:

SELECT aw.id_movie, count(*) AS awards
FROM   award       a
JOIN   awardwinner aw USING (id_award)
WHERE  a.award_year = 2012
GROUP  BY 1
ORDER  BY 2 DESC, 1 -- as tie breaker
LIMIT  1

Using positional references (1, 2) here as shorthand.
I added id_movie to ORDER BY as tie breaker in case multiple movies should qualify for the win.