Search code examples
sqlsql-server-2008selectsubquerylinked-server

Query not listing duplicate records in sql?


I am using below query to find duplicate records

SELECT F.ID,f.CIVILID
FROM DBO.EMPUDF AS F
WHERE f.civilid IN (SELECT CIVILID
                    FROM dBO.EMPUDF
                    GROUP BY CIVILID
                    HAVING COUNT (CIVILID) > 1) 

The subquery returns 218 records which are duplicate records, but when I try to execute the query to return name and id it returns more than 10k records with many null id columns.


Solution

  • Well, you are dealing with duplicates, so there are a lot of them.

    You can get an idea of how many are duplicated by doing a histogram of histograms:

    SELECT cnt, COUNT(*), MIN(CIVILID), MAX(CIVILID)
    FROM (SELECT CIVILID, COUNT(*) as cnt
          FROM  dBO.EMPUDF
          GROUP BY CIVILID
         ) c
    GROUP BY cnt
    ORDER BY cnt;
    

    Also, your original query is more simply written using window functions:

    SELECT e.*
    FROM (SELECT e.*, COUNT(*) OVER (PARTITION BY CIVILID) as cnt
          FROM  dBO.EMPUDF
         ) e
    WHERE cnt > 1;