sql postgresql activerecord relational-division

Get the count of rows count after GROUP BY

Here is the code I use in a Ruby on Rails project to find residences which have amenities with the ids 48, 49 and 50. They are connected with a has_many through connection.

id_list = [48, 49, 50]
Residence.joins(:listed_amenities).
          where(listed_amenities: {amenity_id: id_list}).
          group('residences.id').
          having("count(listed_amenities.*) = ?", id_list.size)

The resulting SQL:

SELECT "residences".*
FROM "residences"
INNER JOIN "listed_amenities" ON "listed_amenities"."residence_id" = "residences"."id"
WHERE "listed_amenities"."amenity_id" IN (48, 49, 50)
GROUP BY residences.id
HAVING count(listed_amenities.*) = 3

I'm interested in the number of residences that result from this query. Is there a way to add a count or something else to let the database do that calculation? I don't want to waste computing power by doing it in Ruby. Adding a .count method doesn't work. It results in {528747=>3, 529004=>3, 529058=>3}.

Solution

If your design enforces referential integrity, you don't have to join to the table residences for this purpose at all. Also assuming a UNIQUE or PK constraint on (residence_id, amenity_id) (else you need different queries!)

The best query depends on what you need exactly.

Using a window function, you can even do this in a single query level:

SELECT count(*) OVER () AS ct
FROM   listed_amenities
WHERE  amenity_id IN (48, 49, 50)
GROUP  BY residence_id
HAVING count(*) = 3
LIMIT  1;

This window function appends the total count to every row without aggregating rows. Consider the sequence of events in a SELECT query:

Best way to get result count before LIMIT was applied

Accordingly, you could use a similar query to return all qualifying IDs (or even whole rows) and append the count to every row (redundantly):

SELECT residence_id, count(*) OVER () AS ct
FROM   listed_amenities
WHERE  amenity_id IN (48, 49, 50)
GROUP  BY residence_id
HAVING count(*) = 3;

But better use a subquery, that's typically much cheaper:

SELECT count(*) AS ct
FROM  (
   SELECT 1
   FROM   listed_amenities
   WHERE  amenity_id IN (48, 49, 50)
   GROUP  BY residence_id 
   HAVING count(*) = 3
   ) sub;

You could return an array of IDs (as opposed to the set above) at the same time, for hardly any more cost:

SELECT array_agg(residence_id ) AS ids, count(*) AS ct
FROM  (
   SELECT residence_id 
   FROM   listed_amenities
   WHERE  amenity_id IN (48, 49, 50)
   GROUP  BY residence_id
   HAVING count(*) = 3
   ) sub;

There are many other variants, you would have to clarify the expected result. Like this one:

SELECT count(*) AS ct
FROM   listed_amenities l1
JOIN   listed_amenities l2 USING (residence_id)
JOIN   listed_amenities l3 USING (residence_id)
WHERE  l1.amenity_id = 48
AND    l2.amenity_id = 49
AND    l2.amenity_id = 50;

Basically it's a case of relational division. We have assembled an arsenal of techniques here:

How to filter SQL results in a has-many-through relation