I got file information in one table and file_id and checksums stored in other one. Now i want to search for duplicate files in table with checksums
SELECT A.file_id pid, B.file_id sid
FROM checksums A INNER JOIN checksums B ON A.checksum = B.checksum
WHERE A.file_id <> B.file_id
ORDER BY pid
Result I got is fairly good and can be used after further processing. What I'm wondering about is if and how I could get right result straight from query. I mean here to get rid of "cross duplicates" in these two columns. for example if File with id 1 got duplicates with Files that got id 43,77,14 and id 2 with duplicate in file with id 74 then i get following result
| pid | sid |
=============
| 1 | 43 |
| 1 | 77 |
| 1 | 14 |
| 2 | 74 |
...
| 14 | 1 |
...
| 43 | 1 |
...
| 74 | 2 |
...
| 77 | 1 |
...
And what I want to achieve to get rid of one of these opposite pairs to avoid duplicate results, is there way in SQL to do it?
I think an inequality condition should be sufficient:
SELECT a.file_id pid, b.file_id sid
FROM checksums a
INNER JOIN checksums b ON b.checksum = a.checksum
WHERE A.file_id < B.file_id
ORDER BY pid