Search code examples
sqlpostgresqlinner-joingreatest-n-per-group

Working with Postgresql, how do I list an author's earliest book when there is a bridge table?


I've got three tables:

AUTHOR(auth_id, fname, lname)
BOOKAUTHOR(auth_id, book_id)
BOOK(book_id, book_name, publish_date)

AUTHOR
auth_id  fname   lname
----------------------------
1        Bob     Bobson
2        Sam     Samson
3        Bill    Billson
4        Sally   Sallson
5        Mary    Marson

BOOKAUTHOR
auth_id    book_id
------------------
1          11
1          12
2          13
2          14
3          15
3          16
4          17
4          18
5          19
5          20

BOOK
book_id   book_name     publish_date
-------------------------------------
11        Bob Book 1    2015-06-05
12        Bob Book 2    2020-07-06
13        Sam Book 1    2016-04-03
14        Sam Book 2    2020-09-27
15        Bill Book 1   2013-08-20
16        Bill Book 2   2015-01-16
17        Sall Book 1   2012-06-27
18        Sall Book 2   2018-03-10
19        Mary Book 1   2003-08-01
20        Mary Book 2   2020-06-05

where BOOKAUTHOR is a bridge table.
I want to return three columns: author_name, name_of_their_first_book and date_it_was_published.
So far I have:

SELECT fname || ' ' || lname AS author_name, MIN(publish_date) AS publish_date
FROM author a, book b, bookauthor ba
WHERE a.auth_id = ba.auth_id
AND b.book_id = ba.book_id
GROUP BY author_name;

which returns:

author_name    publish_date
------------------------------
Bob Bobson     2015-06-05
Sam Samson     2016-04-03
Bill Billson   2013-08-20
Sally Sallson  2012-06-27
Mary Marson    2003-08-01

But when I try to add the book title though, like below

SELECT a.fname || ' ' || a.lname AS author_name, MIN(b.publish_date) AS publish_date, b.book_name AS latest_book
FROM author a, book b, bookauthor ba
WHERE a.auth_id = ba.auth_id
AND b.book_id = ba.book_id
GROUP BY author_name;

it returns the list of all books by the author, ignoring the MIN(b.publish_date):

author_name    publish_date   latest_book
-------------------------------------------
Bob Bobson     2015-06-05     Bob Book 1
Bob Bobson     2020-07-06     Bob Book 2
Sam Samson     2016-04-03     Sam Book 1
Sam Samson     2020-09-27     Sam Book 2
Bill Billson   2013-08-20     Bill Book 1
Bill Billson   2015-01-16     Bill Book 2
Sally Sallson  2012-06-27     Sall Book 1
Sally Sallson  2018-03-10     Sall Book 2
Mary Marson    2003-08-01     Mary Book 1
Mary Marson    2020-06-05     Mary Book 2

I imagine the correct solution is to use joins somehow, but I haven't wrapped my head around them too well. If it is a join, could you also please perhaps explain what they're doing?


Solution

  • You can use joins and distinct on:

    select distinct on (a.auth_id) a.*, b.*
    from author a
    inner join bookauthor ba on ba.auth_id = a.auth_id
    inner join book b on b.book_id = ba.book_id
    order by a.auth_id, b.publish_date