Search code examples
sqlpostgresqlaggregate-functionssqlx

WHERE on INNER JOIN table but include all rows that match on right table


I have 2 tables with a 1-to-many relationship. Table A contains 'OBJECTS' and table B contains 'COMPONENTS' have id's that are related and I wrote a query to gather all OBJECTS from table A and associated COMPONENTS from table B that looks like this:

SELECT A.*, ARRAY_AGG(B.*)
FROM A INNER JOIN B USING(id)
GROUP BY A.*

This gives me what I expect.

Next I have a field on B called 'type' that can be 'X', 'Y', or 'Z'. I want to do the same query above, but filter the results such that only OBJECTS that have associated COMPONENTS where B.type = 'X'. Furthermore, I still want all associated COMPONENTS that aren't B.type = 'X' for each returned OBJECT.

My first attempt was to write the following query:

SELECT A.*, ARRAY_AGG(B.*)
FROM A INNER JOIN B USING(id)
WHERE B.type = 'X'
GROUP BY A.*

But this query strips COMPONENTS that aren't type = 'X'.

Is there a way to get all OBJECTS that have associated COMPONENTS with type = 'X' but also the result set returns all associated COMPONENTS for each object regardless of type?

Here's an example dataset with the expected result set:

A:

id
1
2
3
4

B:

id type
1 X
1 Y
1 Z
2 Y
2 Z
3 X
4 Z
Result:
(1, [(1,X),(1,Y),(1,Z)]),
(3, [(3,X)])

These queries are part of an SQLX Rust project, if that makes a difference.


Solution

  • While referential integrity is enforced, and you query for all IDs, you don't even have to include table A in the query:

    SELECT id, array_agg(B.*)
    FROM   B
    GROUP  BY id
    HAVING bool_or(true) FILTER (WHERE type = 'X');
    

    About the aggregate FILTER clause:

    Should be as fast as it gets - unless type = 'X' is rare.
    In which case - assuming B(type, id) is unique:

    SELECT id, array_agg(B.*)
    FROM  (SELECT id FROM B WHERE type = 'X') b1
    JOIN   B USING (id)
    GROUP  BY 1;
    

    With one index on (type, id) - could be the UNIQUE constraint with leading type.
    (Better yet, a partial index on (id) WHERE type = 'X', but that may be too specialized.)
    And another one on just (id), which should be a given.

    If B(type, id) is not unique, add DISTINCT to the subquery or switch to IN or EXISTS, like:

    SELECT id, array_agg(B.*)
    FROM   B
    WHERE  EXISTS (SELECT FROM B b1 WHERE b1.id = b.id AND b1.type = 'X')
    GROUP  BY 1;