Search code examples
sqldatabasesimilarity

Find similarities in tables


I have a table with thousands of records:

+-----+-------+------+----------+
| id  | group | code |  label   |
+-----+-------+------+----------+
| 1   | abc   | 345  | myLabel1 |
| 2   | abc   | 345  | myLabel4 |
| 3   | def   | 987  | myLabel2 |
| 4   | def   | 345  | myLabel4 |
| 5   | def   | 987  | myLabel2 |
| 6   | def   | 231  | myLabel1 |
| 7   | def   | 231  | myLabel9 |
| 8   | abc   | 345  | myLabel3 |
| 9   | def   | 987  | myLabel2 |
| 10  | def   | 345  | myLabel1 |
| ... | ...   | ...  | ...      |
+-----+-------+------+----------+

In this table some entries are similar to each other.

For example 1,2 and 8: group is "abc" and code is "345".
Another example are entries 3, 5 and 9: group is "def", code is "987" and label is "myLabel2".

Formally one could say:

IF group = "abc", THEN code = "345"
IF group = "def" AND code = "987", THEN label = "myLabel2 "

I want to run a program over my tables which finds those similarities.

PS: This table is just an example. It actually has more than 20 columns with different names.


Solution

  • This will give you duplicate rows in table with number of time occurrence

    SELECT      [group], [code] , COUNT(1) as CNT
    FROM        YourTableName
    GROUP BY    [group], [code] 
    HAVING      COUNT(1) > 1
    ORDER BY CNT;