Search code examples
sqlpostgresqlsubqueryrelational-division

Is "WHERE (subquery) IN (subquery)" possible?


We need to select the Tests that only use the tags connected to Machine.

  • Tags are many to many to Tests. (TagTest associative table)

  • Tags are many to many to Machines. (TagMachine associative table)

Examples:

  • If a Test has tags [A,B,C] and Machine has [A,B,D] the test should not be chosen because its tags are not a subset of Machine's tags.

  • If a Test has tags [A,B] and Machine has [A,B,D] the test should be included.

  • If a Test has no tags it should always be included.

Something like this construct should work:

SELECT *
FROM Test te
WHERE 
    (SELECT tt.tagId
    FROM TagTest tt 
    WHERE tt.testId = te.Id)
 IN
    (SELECT tm.tagId
    FROM TagMachine tm
    WHERE tm.machineId = 123)

However is this kind of query possible? If not, how is it possible to achieve the desired result?


Solution

  • IN() can't do this by itself. You can make two CTE's and JOIN them together, but it's still a bit tricky.

    Instead, let's turn the problem around. Instead of looking for records that match all good tags, we can look for records that are missing any one required tag. From the first example in the question ([A,B,C] vs [A,B,D]), we're looking for the TestTag records with the C tag. Once we have this information, we can use it in a subquery to exclude all Test records with an Id that appeared in those results.

    So the first thing to do is use an exclusion join to find TestTag results where the corresponding TagMachine record is missing:

    SELECT tt.testId, tt.tagId
    FROM TestTag tt 
    LEFT JOIN TagMachine tm ON tm.machineId = 123 AND tm.tagId = tt.tagId
    WHERE tm.tagId IS NULL
    

    The presence of any testId in the results of the above query makes the Test with that Id ineligible... but we do want all other Test records. So now just limit this to DISTINCT testId and use it as a subquery in any of an exclusion join, NOT IN(), or NOT EXISTS(). Take your pick:

    SELECT * 
    FROM Tests
    WHERE Id NOT IN (
         --identify tests hat are missing at least one tag
         SELECT DISTINCT tt.testId 
         FROM TestTag tt 
         LEFT JOIN TagMachine tm ON tm.machineId = 123 AND tm.tagId = tt.tagId
         WHERE tm.tagId IS NULL)