Search code examples
sqlpostgresqljoin

How to join only one row in joined table with postgres?


I have the following schema:

CREATE TABLE author (
    id   integer
  , name varchar(255)
);
CREATE TABLE book (
    id        integer
  , author_id integer
  , title     varchar(255)
  , rating    integer
);

And I want each author with its last book:

SELECT book.id, author.id, author.name, book.title as last_book
FROM author
JOIN book book ON book.author_id = author.id

GROUP BY author.id
ORDER BY book.id ASC

Apparently you can do that in mysql: Join two tables in MySQL, returning just one row from the second table.

But postgres gives this error:

ERROR: column "book.id" must appear in the GROUP BY clause or be used in an aggregate function: SELECT book.id, author.id, author.name, book.title as last_book FROM author JOIN book book ON book.author_id = author.id GROUP BY author.id ORDER BY book.id ASC

It's because:

When GROUP BY is present, it is not valid for the SELECT list expressions to refer to ungrouped columns except within aggregate functions, since there would be more than one possible value to return for an ungrouped column.

How can I specify to postgres: "Give me only the last row, when ordered by joined_table.id, in the joined table ?"


Edit: With this data:

INSERT INTO author (id, name) VALUES
  (1, 'Bob')
, (2, 'David')
, (3, 'John');

INSERT INTO book (id, author_id, title, rating) VALUES
  (1, 1, '1st book from bob', 5)
, (2, 1, '2nd book from bob', 6)
, (3, 1, '3rd book from bob', 7)
, (4, 2, '1st book from David', 6)
, (5, 2, '2nd book from David', 6);

I should see:

book_id author_id name    last_book
3       1         "Bob"   "3rd book from bob"
5       2         "David" "2nd book from David"

Solution

  • select distinct on (author.id)
        book.id, author.id, author.name, book.title as last_book
    from
        author
        inner join
        book on book.author_id = author.id
    order by author.id, book.id desc
    

    Check distinct on

    SELECT DISTINCT ON ( expression [, ...] ) keeps only the first row of each set of rows where the given expressions evaluate to equal. The DISTINCT ON expressions are interpreted using the same rules as for ORDER BY (see above). Note that the "first row" of each set is unpredictable unless ORDER BY is used to ensure that the desired row appears first.

    With distinct on it is necessary to include the "distinct" columns in the order by. If that is not the order you want then you need to wrap the query and reorder

    select 
        *
    from (
        select distinct on (author.id)
            book.id, author.id, author.name, book.title as last_book
        from
            author
            inner join
            book on book.author_id = author.id
        order by author.id, book.id desc
    ) authors_with_first_book
    order by authors_with_first_book.name
    

    Another solution is to use a window function as in Lennart's answer. And another very generic one is this

    select 
        book.id, author.id, author.name, book.title as last_book
    from
        book
        inner join
        (
            select author.id as author_id, max(book.id) as book_id
            from
                author
                inner join
                book on author.id = book.author_id
            group by author.id
        ) s
        on s.book_id = book.id
        inner join
        author on book.author_id = author.id