Search code examples
javasqlpostgresqlgreatest-n-per-group

SQL distinct multiple columns


I have a table as following:

book_id   author_id     mark     year
   1          1          BAD     2014
   1          1         MEDIUM   2014
   1          1         GREAT    2015

I would like to execute a query that will give me the best book for each author. Something like this:

book_id   author_id     mark     year
   1          1         GREAT    2015

I tried to use the distinct keyword on multiple fields - but when I do this:

select distinct book_id, author_id from Books 

I get only the book_id and the author_id (as expected) - but I also need the mark and the year - but I cannot add it to the distinct phrase.

Currently I'm using Postgres 9.4 but I need an ANSI-SQL solution.

Is there a way I can do that?


Solution

  • questions are usually solved using window functions:

    select *
    from (
       select book_id, author_id, mark, year, 
              row_number() over (partition by author_id order by case mark when 'GREAT' then 1 when 'MEDIUM' then 2 else 3 end) as rn
       from books
    ) t
    where rn = 1;
    

    The above is standard ANSI SQL, but in Postgres using the (proprietary) distinct on is usually much faster:

    select distinct on (author_id) book_id, author_id, mark, year, 
    from books
    order by author_id, 
             case mark when 'GREAT' then 1 when 'MEDIUM' then 2 else 3 end