Search code examples
sqlsqliteduplicatesinner-joinself-join

How to get rid of duplicated pairs of id's from two column query result


I got file information in one table and file_id and checksums stored in other one. Now i want to search for duplicate files in table with checksums

SELECT A.file_id pid, B.file_id sid
FROM checksums A INNER JOIN checksums B ON A.checksum = B.checksum
WHERE A.file_id <> B.file_id
ORDER BY pid

Result I got is fairly good and can be used after further processing. What I'm wondering about is if and how I could get right result straight from query. I mean here to get rid of "cross duplicates" in these two columns. for example if File with id 1 got duplicates with Files that got id 43,77,14 and id 2 with duplicate in file with id 74 then i get following result

| pid | sid |
=============
|  1  |  43 |
|  1  |  77 |
|  1  |  14 |
|  2  |  74 |
...
|  14 |   1 |
...
|  43 |   1 |
...
|  74 |   2 |
...
|  77 |   1 |
...

And what I want to achieve to get rid of one of these opposite pairs to avoid duplicate results, is there way in SQL to do it?


Solution

  • I think an inequality condition should be sufficient:

    SELECT a.file_id pid, b.file_id sid
    FROM checksums a 
    INNER JOIN checksums b ON b.checksum = a.checksum
    WHERE A.file_id < B.file_id
    ORDER BY pid