Search code examples
sqlpostgresqlactiverecordrelational-division

Get the count of rows count after GROUP BY


Here is the code I use in a Ruby on Rails project to find residences which have amenities with the ids 48, 49 and 50. They are connected with a has_many through connection.

id_list = [48, 49, 50]
Residence.joins(:listed_amenities).
          where(listed_amenities: {amenity_id: id_list}).
          group('residences.id').
          having("count(listed_amenities.*) = ?", id_list.size)

The resulting SQL:

SELECT "residences".*
FROM "residences"
INNER JOIN "listed_amenities" ON "listed_amenities"."residence_id" = "residences"."id"
WHERE "listed_amenities"."amenity_id" IN (48, 49, 50)
GROUP BY residences.id
HAVING count(listed_amenities.*) = 3

I'm interested in the number of residences that result from this query. Is there a way to add a count or something else to let the database do that calculation? I don't want to waste computing power by doing it in Ruby. Adding a .count method doesn't work. It results in {528747=>3, 529004=>3, 529058=>3}.


Solution

  • If your design enforces referential integrity, you don't have to join to the table residences for this purpose at all. Also assuming a UNIQUE or PK constraint on (residence_id, amenity_id) (else you need different queries!)

    The best query depends on what you need exactly.

    Using a window function, you can even do this in a single query level:

    SELECT count(*) OVER () AS ct
    FROM   listed_amenities
    WHERE  amenity_id IN (48, 49, 50)
    GROUP  BY residence_id
    HAVING count(*) = 3
    LIMIT  1;
    

    This window function appends the total count to every row without aggregating rows. Consider the sequence of events in a SELECT query:

    Accordingly, you could use a similar query to return all qualifying IDs (or even whole rows) and append the count to every row (redundantly):

    SELECT residence_id, count(*) OVER () AS ct
    FROM   listed_amenities
    WHERE  amenity_id IN (48, 49, 50)
    GROUP  BY residence_id
    HAVING count(*) = 3;
    

    But better use a subquery, that's typically much cheaper:

    SELECT count(*) AS ct
    FROM  (
       SELECT 1
       FROM   listed_amenities
       WHERE  amenity_id IN (48, 49, 50)
       GROUP  BY residence_id 
       HAVING count(*) = 3
       ) sub;
    

    You could return an array of IDs (as opposed to the set above) at the same time, for hardly any more cost:

    SELECT array_agg(residence_id ) AS ids, count(*) AS ct
    FROM  (
       SELECT residence_id 
       FROM   listed_amenities
       WHERE  amenity_id IN (48, 49, 50)
       GROUP  BY residence_id
       HAVING count(*) = 3
       ) sub;
    

    There are many other variants, you would have to clarify the expected result. Like this one:

    SELECT count(*) AS ct
    FROM   listed_amenities l1
    JOIN   listed_amenities l2 USING (residence_id)
    JOIN   listed_amenities l3 USING (residence_id)
    WHERE  l1.amenity_id = 48
    AND    l2.amenity_id = 49
    AND    l2.amenity_id = 50;
    

    Basically it's a case of relational division. We have assembled an arsenal of techniques here: