I have a table which contains invalid duplicate records and I am trying to find the same. The below is the query, which returns the result.
SELECT Id,
StartDate,
EndDate,
Rate,
TDate,
COUNT(*) AS Record
FROM TableA
GROUP BY Id,
StartDate,
EndDate,
Rate,
TDate
HAVING COUNT(*)>1
Now, there is a requirement. One of the Rate is 'NP', which is correctly supposed to have duplicates. It only becomes an invalid/incorrect if a duplicate 'NP' record is accompanied by any other duplicated Rate 'X' for the same ID and TDate.
How do I go about finding the same? After going through the comments, I realize I'm not able to explain fully what the issue Im facing is. The time part is necessary to get the duplicates, meaning, it's not a duplicate of the time is different even though the TDate is the same, hence why I grouped by both startdate and enddate in my query. Example Id 4 has 3 data of Rate X, but only 1 of them is duplicated.
Id | StartDate | EndDate | Rate | Tdate |
---|---|---|---|---|
1 | 19-06-2024 12:00 | 19-06-2024 14:00 | NP | 19-06-2024 |
1 | 19-06-2024 12:00 | 19-06-2024 14:00 | NP | 19-06-2024 |
2 | 17-06-2024 12:00 | 17-06-2024 14:00 | NP | 17-06-2024 |
2 | 17-06-2024 12:00 | 17-06-2024 14:00 | NP | 17-06-2024 |
2 | 17-06-2024 14:00 | 17-06-2024 16:00 | P | 17-06-2024 |
2 | 17-06-2024 14:00 | 17-06-2024 16:00 | P | 17-06-2024 |
3 | 16-06-2024 12:00 | 16-06-2024 14:00 | NP | 16-06-2024 |
3 | 16-06-2024 12:00 | 16-06-2024 14:00 | NP | 16-06-2024 |
3 | 16-06-2024 10:00 | 16-06-2024 12:00 | P | 16-06-2024 |
3 | 16-06-2024 10:00 | 16-06-2024 12:00 | P | 16-06-2024 |
3 | 16-06-2024 08:00 | 16-06-2024 10:00 | X | 16-06-2024 |
3 | 16-06-2024 08:00 | 16-06-2024 10:00 | X | 16-06-2024 |
4 | 15-06-2024 08:00 | 15-06-2024 10:00 | P | 15-06-2024 |
4 | 15-06-2024 08:00 | 15-06-2024 10:00 | P | 15-06-2024 |
4 | 15-06-2024 12:00 | 15-06-2024 14:00 | X | 15-06-2024 |
4 | 15-06-2024 12:00 | 15-06-2024 14:00 | X | 15-06-2024 |
4 | 15-06-2024 15:00 | 15-06-2024 16:00 | X | 15-06-2024 |
Here, we can see that Id 1 is duplicated with Rate NP. But for the same TDate and Id, 1 does not have any other duplicated data of Rate X. Hence this is a valid record.
Id 2, has two duplicates of rate NP along with another duplicate of Rate P. So the duplicated rate P, is invalid. But duplicated rate NP, is valid since it is not accompanied by Rate X.
For Id 3, it has duplicates of NP, P and X. Thus for this Id 3, it should return all 3 as invalid.
Id 4, has two duplicates of Rate P and X, so both are invalid.
So from the data above, I am expecting results as:
Id | StartDate | EndDate | Rate | Tdate |
---|---|---|---|---|
2 | 17-06-2024 14:00 | 17-06-2024 16:00 | P | 17-06-2024 |
3 | 16-06-2024 12:00 | 16-06-2024 14:00 | NP | 16-06-2024 |
3 | 16-06-2024 10:00 | 16-06-2024 12:00 | P | 16-06-2024 |
3 | 16-06-2024 08:00 | 16-06-2024 10:00 | X | 16-06-2024 |
4 | 15-06-2024 08:00 | 15-06-2024 10:00 | P | 15-06-2024 |
4 | 15-06-2024 12:00 | 15-06-2024 14:00 | X | 15-06-2024 |
Here is one solution using a common table expression (CTE) to perform the grouping 1 time and self join on the results.
--Duplicated records (exclude records with NP rates unless X rate duplicates in same group)
WITH RecordsWithDuplicates as
(
--Duplicated records
SELECT Id, StartDate, EndDate, Rate, Tdate
FROM Example
GROUP BY ID, Tdate, StartDate, EndDate, Rate
HAVING COUNT(*) > 1
)
SELECT d.id, FORMAT(d.StartDate,'dd-MM-yyyy HH:mm') as StartDate, FORMAT(d.EndDate,'dd-MM-yyyy HH:mm') as EndDate, d.Rate, FORMAT(d.Tdate,'dd-MM-yyyy') as Tdate
FROM RecordsWithDuplicates d
LEFT JOIN RecordsWithDuplicates x on x.id=d.id and x.TDate=d.Tdate and x.Rate='X'
WHERE d.Rate!='NP' OR (d.Rate='NP' AND x.id IS NOT NULL)
Id | StartDate | EndDate | Rate | Tdate |
---|---|---|---|---|
2 | 17-06-2024 14:00 | 17-06-2024 16:00 | P | 17-06-2024 |
3 | 16-06-2024 12:00 | 16-06-2024 14:00 | NP | 16-06-2024 |
3 | 16-06-2024 10:00 | 16-06-2024 12:00 | P | 16-06-2024 |
3 | 16-06-2024 08:00 | 16-06-2024 10:00 | X | 16-06-2024 |
4 | 15-06-2024 08:00 | 15-06-2024 10:00 | P | 15-06-2024 |
4 | 15-06-2024 12:00 | 15-06-2024 14:00 | X | 15-06-2024 |