Search code examples
sqlpostgresqlleft-joinaggregate

PostgreSQL subquery COUNT fails when the subquery is joined more than once


I have 2 tables:

Table class:

id serial4 PRIMARY KEY
name varchar(64)
code varchar(64)

Table class_event, where I store events related to classes, such as "started" and "ended".

id serial4
class_id int4 NOT NULL  // ->  FK to the class table
event_type varchar(1) NOT NULL  // -> 's' for started, 'e' for ended.

I need a query the amount of times each class has been started and ended. This works:

select
    c.code,
    c.name,
    count(started.id) "started"
from "class" c
left join (select id, class_id, event_type from "class_event" where event_type = 's') started 
    on started.klass_id = c.id
group by c.code, c.name
order by started desc;

But when I do exactly the same to get the amount of ended classes it shows incorrect amounts:

select
    c.code,
    c.name,
    count(started.id) "started",
    count(ended.id) "ended"
from "class" c
left join (select id, class_id, event_type from "class_event" where event_type = 's') started 
    on started.klass_id = c.id
left join (select id, class_id, event_type from "class_event" where event_type = 'e') ended 
    on ended.klass_id = c.id
group by c.code, c.name
order by started desc;

Also, the query takes significantly more time to execute. Is there anything I'm missing?


Solution

  • Is there anything I'm missing?

    Yes, multiple joins multiply rows. It's exactly the same problem as discussed here:

    While you query the whole table it's typically cleaner and faster to aggregate first and join later. See:

    This also avoids the original problem on principle, even for multiple joins - which we don't need.

    SELECT * 
    FROM   class c
    LEFT   JOIN (
       SELECT class_id AS id
            , count(*) FILTER (WHERE event_type = 's') AS started
            , count(*) FILTER (WHERE event_type = 'e') AS ended
       FROM   class_event
       GROUP  BY 1
       ) e  USING (id)
    ORDER  BY e.started DESC NULLS LAST;
    

    NULLS LAST because it's conceivable that some of the classes have no related rows in table class_event (yet), and the resulting NULL values surely shouldn't sort on top. See:

    About the aggregate FILTER clause:

    Aside:

    For just a hand full of allowed values, I would consider the data type "char" instead of varchar(1) for event_type. See: