Search code examples
sqlsqliteunioncommon-table-expressionconditional-aggregation

Finding duplicate values in multiple colums in a SQL table and count for chars


referring to this question:

Finding duplicate values in multiple colums in a SQL table and count

I have the following table structure:

id name1 name2 name3  ...
 1 Hans  Peter Frank
 2 Hans  Frank Peter
 3 Hans  Peter Frank
 4 Paul  Peter Hans
 .
 .
 .

I use the following command to display the duplicates and the counts:

SELECT COUNT(name1), name1, name2, name3 
FROM table 
GROUP BY name1, name2, name3 
HAVING (COUNT(name1) > 1) AND (COUNT(name2) > 1) AND (COUNT(name3) > 1)

This command gives me a count of 2. I would like to know how the second line could also be counted as a dublicate.

Unfortunately, the solution to the original question (Finding duplicate values in multiple colums in a SQL table and count) does not work for char


Solution

  • First normalize the table with UNION ALL in a CTE to get each of the 3 names in a separate row.
    Then with ROW_NUMBER() window function you can rank alphabetically the 3 names so that you can group by them:

    WITH cte(id, name) AS (
      SELECT id, name1 FROM tablename
      UNION ALL
      SELECT id, name2 FROM tablename
      UNION ALL
      SELECT id, name3 FROM tablename
    )
    SELECT COUNT(*) count, name1, name2, name3
    FROM (
      SELECT id,
             MAX(CASE WHEN rn = 1 THEN name END) name1,
             MAX(CASE WHEN rn = 2 THEN name END) name2,
             MAX(CASE WHEN rn = 3 THEN name END) name3
      FROM (
        SELECT *, ROW_NUMBER() OVER (PARTITION BY id ORDER BY name) rn
        FROM cte
      )
      GROUP BY id
    )
    GROUP BY name1, name2, name3
    HAVING COUNT(*) > 1;
    

    Another way to do it, that uses similar logic to your previous question with numeric values, with string function REPLACE() instead of window functions, but works only if the 3 names in each row are different:

    SELECT COUNT(*) count,
           MIN(name1, name2, name3) name_1,
           REPLACE(
             REPLACE(
               REPLACE(name1 || ',' || name2 || ',' || name3, MIN(name1, name2, name3), ''), 
               MAX(name1, name2, name3), ''), ',', ''
           ) name_2,
           MAX(name1, name2, name3) name_3
    FROM tablename 
    GROUP BY name_1, name_2, name_3 
    HAVING COUNT(*) > 1;
    

    See the demo.