I have a SQL Server database table (DuplicateIds) that contains the ID's of duplicate words from another table (Words). Here's and example of the data in the table DuplicateIds:
word_id | word
----------------------------------
244 | ape
603 | ape
1873 | ape
372 | banana
3095 | banana
...and so on. Often there are only two or three duplicates but there are cases with 10 or even more duplicates.
Now I want to use this table with duplicates to create a new table that connects the ID's of the words that are the same. I guess the new table would look something like this:
word_id | connected_id
----------------------------------
244 | 603
244 | 1873
603 | 244
603 | 1873
1873 | 244
1873 | 603
372 | 3095
3095 | 372
With this table I can look up a certain word using its ID and get the ID's of all the words that are the same.
Now I wonder if I can write a (T)SQL statement that generates this new connection table for me, using the data from DuplicateIds?
This should do it:
SELECT
di.word_id
,di2.word_id connected_id
into NewTable
from DuplicateIds di
inner join DuplicateIds di2
on di2.word = di.word
and di2.word_id <> di.word_id