Search code examples
sqlpostgresqlrelational-division

Simplify and/or optimize sql query with INTERSECT or HAVING


I have the following data:

 tag_id | taggable_id 
--------+-------------
      1 |           1
      2 |           1
      3 |           1
      4 |           1
      1 |           2
      2 |           2
      5 |           2
      6 |           2
      7 |           3
      8 |           3
      9 |           3
     10 |           3

And I want to bring all the taggable_ids which are in a group of tag_ids AND in another group AND another... (max 4 groups).
For example:

  • if I search all the taggable_ids with tag_id 1 or 7 AND tag_id 4 should return just 1
  • if I search all the taggable_ids with tag_id 1 AND tag_id 6 AND tag_id 2 or 8 should return just 2
  • if I search all the taggable_ids with tag_id 8 AND tag_id 5 should not return ids

In a gross way(for the second example), the query is the following:

SELECT taggable_id FROM taggings WHERE tag_id in (1)
INTERSECT
SELECT taggable_id FROM taggings WHERE tag_id in (6)
INTERSECT
SELECT taggable_id FROM taggings WHERE tag_id in (2,8)

I think simplifying it, it looks like:

SELECT taggable_id FROM taggings WHERE tag_id in (1,2,6,8)
GROUP BY taggable_id HAVING COUNT(*)=3

but I'm wondering if can be done in a simpler way. Any thoughts?


Solution

  • This can be cast as a case of relational division. We have assembled an arsenal of query techniques under this related question:

    Depending on data distribution and other factors, this may be fastest:

    SELECT DISTINCT taggable_id
    FROM   taggings t1
    JOIN   taggings t2 USING (taggable_id)
    JOIN   taggings t3 USING (taggable_id)
    WHERE  t1.tag_id = 1
    AND    t2.tag_id = 6
    AND    t3.tag_id IN (2, 8);
    

    Assuming unique (tag_id, taggable_id), DISTINCT is actually not needed for the example. But it might be necessary with other (list) predicates.

    SQL Fiddle (building on @Clodoaldo's, thanks).