Given this table structure and example data (t3 should not be used in the query, it is only here to show the relation between t1 and t2):
t1 t2 t3
-------------- ----------------- --------------------------------
| id | value | | t1key | t3key | | id | value |
| 1 | 2008 | | 3 | 1 | | 1 | "New intel cpu in 2010" |
| 2 | 2009 | | 4 | 1 | | 2 | "New amd cpu in 2008" |
| 3 | 2010 | | 6 | 1 | | | ... |
| 4 | intel | | 1 | 2 | --------------------------------
| 5 | amd | | 5 | 2 |
| 6 | cpu | | 6 | 2 |
| | ... | | | ... |
-------------- -----------------
How would you build a SQL query that would satisfy the following:
Given the input for t1.id is the set {6} returns t1.id set {3,4,6,1,5}
Given the input for t1.id is the set {6,4} returns t1.id set {3,4,6}
Given the input for t1.id is the set {5,4} returns t1.id set {}
and doesn't kill performance when the tables are bigger...?
It's not very clear what you want.
I will call table t1 word
, call table t3 phrase
and call table t2 word is in phrase
.
Then I guess you want to find all word.ids that are in a same phrase as a specific set of word.ids. Is that correct?
SELECT DISTINCT t1.id
FROM t1
JOIN t2
ON t1.id = t2.t1key
JOIN t2 copyt2
ON copyt2.t3key = t2.t3key
WHERE copyt2.t1key IN
(6,4) --what you want to check here
CORRECTION
Reading Joe's comment and re-reading the question details, I guess you want to find all words that appear in same phrase with ALL words in your specified list.
This looks like a relational division problem:
SELECT DISTINCT t2a.t1key
FROM t2 AS t2a
WHERE NOT EXISTS
( SELECT *
FROM t2 AS t2b
WHERE t2b.t1key IN (6,4)
AND NOT EXISTS
( SELECT *
FROM t2 AS t2c
WHERE t2a.t3key = t2c.t3key
AND t2c.t1key = t2b.t1key
)
)
2nd solution:
SELECT a.t1key
FROM t2 AS a
JOIN t2 as b
ON a.t3key = b.t3key
WHERE b.t1key IN (6,4) --list you want to check
GROUP BY a.t1key, a.t3key
HAVING COUNT(*) = 2 --size of list
;
3rd solution:
SELECT DISTINCT t1key
FROM t2
WHERE t3key IN
( SELECT t3key
FROM t2
WHERE t1key IN (6,4)
GROUP BY t3key
HAVING COUNT(*) = 2
)
;
Note: The first (with NON EXISTS
) solution has a great difference with the other two:
If you try it with a list that its members do not appear in table t2, say (2)
or (2,7)
, it will show ALL t1key's from t2.
The 2nd and 3rd solutions will show NO keys at all in such a case.