Search code examples
sqlsql-servert-sql

Return duplicate rows from a table


I have a table which contains invalid duplicate records and I am trying to find the same. The below is the query, which returns the result.

SELECT Id,
       StartDate,
       EndDate,
       Rate,
       TDate,
       COUNT(*) AS Record
FROM TableA
GROUP BY Id,
         StartDate,
         EndDate,
         Rate,
         TDate
HAVING COUNT(*)>1

Now, there is a requirement. One of the Rate is 'NP', which is correctly supposed to have duplicates. It only becomes an invalid/incorrect if a duplicate 'NP' record is accompanied by any other duplicated Rate 'X' for the same ID and TDate.

How do I go about finding the same? After going through the comments, I realize I'm not able to explain fully what the issue Im facing is. The time part is necessary to get the duplicates, meaning, it's not a duplicate of the time is different even though the TDate is the same, hence why I grouped by both startdate and enddate in my query. Example Id 4 has 3 data of Rate X, but only 1 of them is duplicated.

Id StartDate EndDate Rate Tdate
1 19-06-2024 12:00 19-06-2024 14:00 NP 19-06-2024
1 19-06-2024 12:00 19-06-2024 14:00 NP 19-06-2024
2 17-06-2024 12:00 17-06-2024 14:00 NP 17-06-2024
2 17-06-2024 12:00 17-06-2024 14:00 NP 17-06-2024
2 17-06-2024 14:00 17-06-2024 16:00 P 17-06-2024
2 17-06-2024 14:00 17-06-2024 16:00 P 17-06-2024
3 16-06-2024 12:00 16-06-2024 14:00 NP 16-06-2024
3 16-06-2024 12:00 16-06-2024 14:00 NP 16-06-2024
3 16-06-2024 10:00 16-06-2024 12:00 P 16-06-2024
3 16-06-2024 10:00 16-06-2024 12:00 P 16-06-2024
3 16-06-2024 08:00 16-06-2024 10:00 X 16-06-2024
3 16-06-2024 08:00 16-06-2024 10:00 X 16-06-2024
4 15-06-2024 08:00 15-06-2024 10:00 P 15-06-2024
4 15-06-2024 08:00 15-06-2024 10:00 P 15-06-2024
4 15-06-2024 12:00 15-06-2024 14:00 X 15-06-2024
4 15-06-2024 12:00 15-06-2024 14:00 X 15-06-2024
4 15-06-2024 15:00 15-06-2024 16:00 X 15-06-2024

Here, we can see that Id 1 is duplicated with Rate NP. But for the same TDate and Id, 1 does not have any other duplicated data of Rate X. Hence this is a valid record.

Id 2, has two duplicates of rate NP along with another duplicate of Rate P. So the duplicated rate P, is invalid. But duplicated rate NP, is valid since it is not accompanied by Rate X.

For Id 3, it has duplicates of NP, P and X. Thus for this Id 3, it should return all 3 as invalid.

Id 4, has two duplicates of Rate P and X, so both are invalid.

So from the data above, I am expecting results as:

Id StartDate EndDate Rate Tdate
2 17-06-2024 14:00 17-06-2024 16:00 P 17-06-2024
3 16-06-2024 12:00 16-06-2024 14:00 NP 16-06-2024
3 16-06-2024 10:00 16-06-2024 12:00 P 16-06-2024
3 16-06-2024 08:00 16-06-2024 10:00 X 16-06-2024
4 15-06-2024 08:00 15-06-2024 10:00 P 15-06-2024
4 15-06-2024 12:00 15-06-2024 14:00 X 15-06-2024

Solution

  • Here is one solution using a common table expression (CTE) to perform the grouping 1 time and self join on the results.

    --Duplicated records (exclude records with NP rates unless X rate duplicates in same group)
    WITH RecordsWithDuplicates as
    ( 
      --Duplicated records
      SELECT Id, StartDate, EndDate, Rate, Tdate
      FROM  Example
      GROUP BY ID, Tdate,  StartDate, EndDate, Rate
      HAVING COUNT(*) > 1
    )
    SELECT d.id, FORMAT(d.StartDate,'dd-MM-yyyy HH:mm') as StartDate, FORMAT(d.EndDate,'dd-MM-yyyy HH:mm') as EndDate,  d.Rate, FORMAT(d.Tdate,'dd-MM-yyyy') as Tdate
    FROM RecordsWithDuplicates d
    LEFT JOIN RecordsWithDuplicates x on x.id=d.id and x.TDate=d.Tdate and x.Rate='X'
    WHERE d.Rate!='NP' OR (d.Rate='NP' AND x.id IS NOT NULL)
    

    fiddle

    Id StartDate EndDate Rate Tdate
    2 17-06-2024 14:00 17-06-2024 16:00 P 17-06-2024
    3 16-06-2024 12:00 16-06-2024 14:00 NP 16-06-2024
    3 16-06-2024 10:00 16-06-2024 12:00 P 16-06-2024
    3 16-06-2024 08:00 16-06-2024 10:00 X 16-06-2024
    4 15-06-2024 08:00 15-06-2024 10:00 P 15-06-2024
    4 15-06-2024 12:00 15-06-2024 14:00 X 15-06-2024