Search code examples
mysqlselectset-theorysubqueryrelational-division

Is there a way to remove the nested query in this type of SQL SELECT?


Given this table structure and example data (t3 should not be used in the query, it is only here to show the relation between t1 and t2):

      t1                 t2                         t3
--------------   -----------------   --------------------------------
| id | value |   | t1key | t3key |   | id | value                   |
|  1 |  2008 |   |     3 |     1 |   |  1 | "New intel cpu in 2010" |
|  2 |  2009 |   |     4 |     1 |   |  2 | "New amd cpu in 2008"   |
|  3 |  2010 |   |     6 |     1 |   |    |                     ... |
|  4 | intel |   |     1 |     2 |   --------------------------------
|  5 |   amd |   |     5 |     2 |
|  6 |   cpu |   |     6 |     2 |
|    |   ... |   |       |   ... |
--------------   -----------------

How would you build a SQL query that would satisfy the following:

Given the input for t1.id is the set {6} returns t1.id set {3,4,6,1,5}
Given the input for t1.id is the set {6,4} returns t1.id set {3,4,6}
Given the input for t1.id is the set {5,4} returns t1.id set {}

and doesn't kill performance when the tables are bigger...?


Solution

  • It's not very clear what you want.

    I will call table t1 word, call table t3 phrase and call table t2 word is in phrase.

    Then I guess you want to find all word.ids that are in a same phrase as a specific set of word.ids. Is that correct?

    SELECT DISTINCT t1.id
    FROM t1 
      JOIN t2
        ON t1.id = t2.t1key
      JOIN t2 copyt2
        ON copyt2.t3key = t2.t3key 
    WHERE copyt2.t1key IN
      (6,4)       --what you want to check here
    

    CORRECTION

    Reading Joe's comment and re-reading the question details, I guess you want to find all words that appear in same phrase with ALL words in your specified list.

    This looks like a relational division problem:

    SELECT DISTINCT t2a.t1key
    FROM t2 AS t2a
    WHERE NOT EXISTS
      ( SELECT *
        FROM t2 AS t2b
        WHERE t2b.t1key IN (6,4)
          AND NOT EXISTS
          ( SELECT *
            FROM t2 AS t2c
            WHERE t2a.t3key = t2c.t3key
              AND t2c.t1key = t2b.t1key
          )
      )
    

    2nd solution:

    SELECT a.t1key
    FROM t2 AS a
      JOIN t2 as b
        ON  a.t3key = b.t3key
    WHERE b.t1key IN (6,4)       --list you want to check
    GROUP BY a.t1key, a.t3key
    HAVING COUNT(*) = 2          --size of list
    ;
    

    3rd solution:

    SELECT DISTINCT t1key
    FROM t2
    WHERE t3key IN
      ( SELECT t3key
        FROM t2
        WHERE t1key IN (6,4)
        GROUP BY t3key
        HAVING COUNT(*) = 2
      )
    ;
    

    Note: The first (with NON EXISTS) solution has a great difference with the other two:

    If you try it with a list that its members do not appear in table t2, say (2) or (2,7), it will show ALL t1key's from t2.

    The 2nd and 3rd solutions will show NO keys at all in such a case.