Search code examples
sqlpostgresqlgroup-bycount

Join and group by based on a value of a column to filter


Let's say I have two tables with these data:

author
id | name
---------
1  | John
2  | Jack
3  | James
4  | Bob
5  | Gorge

book
id | author_id | title  | state
-------------------------------------
1  | 1         | book_1 | published
2  | 1         | book_2 | pre-published
3  | 1         | book_3 | published
4  | 1         | book_4 | rejected
5  | 2         | book_5 | published
6  | 3         | book_6 | rejected
7  | 3         | book_7 | pre-published
8  | 4         | book_8 | rejected

What I want is a sql query that gives me each of author's id and name and number of published book even if they have 0 published. I think it might be an easy query but I stuck in it, what I want is:

id | name   | num_of_published_books
------------------------------------
1  | John   |          2
2  | Jack   |          1
3  | James  |          0
4  | Bob    |          0
5  | Gorge  |          0

I'm able to fetch first two rows John and Jack. and also I am able to group them based on author_id and state. but it was not what I want.

note that I don't want to use subquery.


Solution

  • SELECT
      a.id AS author_id,
      a.name,
      COUNT(b.state = 'published' OR NULL) AS num_of_published_books
    FROM
      author a
    LEFT JOIN
      book b ON a.id = b.author_id
    GROUP BY
      a.id, a.name;
    

    Hope it will help you.


    1. b.state = 'published', we can filter out books that have already been published, the result is false(0) or true(1).
    2. OR NULL is to get NULL when the result of the previous step is false(0). In my memory, when using OR operator to compare non-null value and NULL value, the result will always be NULL. COUNT will NULL is ignored.
    3. There are two possible results: true(1) OR NULL and false(0) OR NULL, and their results are true(1) and NULL respectively.
    4. Finally, Count counts the number that is not NULL.