I am trying to find duplicate records by name and the date submitted.
For example, I want to get all records with same name of "SampleOne" and same submitted date of "2016-10-14 12:44 39 460" and exclude any data that doesn't match.
I am using the query shown below, but I'm getting other data too. From my screenshot below I need only rows 1- 4 and not show the ones that doesn't have the same name and same submitDate
.
SELECT TOP (13)
a.DocumentName, a.SubmitDateTime
FROM
Document AS a
INNER JOIN
(SELECT
DocumentName, SubmitDateTime, COUNT(*) AS DocCount
FROM
Document
GROUP BY
DocumentName, SubmitDateTime
HAVING
(COUNT(*) > 1)) AS dt ON a.DocumentName = dt.DocumentName
AND a.SubmitDateTime = dt.SubmitDateTime
LEFT OUTER JOIN
Staging b ON a.DocumentId = b.DocumentId
WHERE
b.DocumentId IS NULL
AND a.SubmitDateTime IS NOT NULL
AND a.InsertDateTime IS NOT NULL
ORDER BY
a.SubmitDateTime
Sample data
DocumentName | SubmitDateTime |
---|---|
SampleOne | 2016-10-14 12:44:39.460 |
SampleOne | 2016-10-14 12:44:39.460 |
SampleTwo | 2016-10-14 12:44:39.460 |
SampleTwo | 2016-10-14 12:44:39.460 |
SampleThree | 2016-10-14 12:45:32.813 |
SampleFive | 2016-10-14 12:46:23.423 |
Doc Name | 2016-10-17 15:15:05.523 |
Doc Name 2 | 2016-10-17 15:15:32.170 |
Doc Name | 2016-10-17 15:17:32.153 |
Doc Name | 2016-10-17 15:17:53.963 |
Doc Name | 2016-10-17 15:18:15.977 |
Document Text | 2016-10-17 15:19:22.950 |
Document Text | 2016-10-17 15:22:25.463 |
I have managed to get the results that I want by using the query below and needs suggestion if I am on the right track.
SELECT t.*
FROM (
SELECT
s.*
, COUNT(*) OVER (PARTITION BY s.DocumentName, s.SubmitDateTime, s.DocumentType) AS DocCount
FROM Document s
LEFT OUTER JOIN
Staging b ON S.DocumentId = b.DocumentId
WHERE
b.DocumentId IS NULL
AND s.SubmitDateTime IS NOT NULL
AND s.InsertDateTime IS NOT NULL
AND s.DocumentName IS NOT NULL
) t
WHERE t.DocCount> 1
ORDER BY t.DocumentName, t.SubmitDateTime
Since you didn't provide any clear input or output I am assuming the following:
The subquery t
will select only the necessary columns DocumentName
and SubmitDateTime
.
I replaced theLEFT OUTER JOIN
with Staging
with a LEFT JOIN
for simplicity, assuming you don't require columns from the Staging
table in the final result.
In the COUNT(*)
function I included the OVER PARTITION to count duplicates only based on DocumentName
and SubmitDateTime
.
SELECT t.DocumentName, t.SubmitDateTime
FROM (
SELECT s.DocumentName, s.SubmitDateTime, COUNT(*) OVER (PARTITION BY s.DocumentName, s.SubmitDateTime) AS DocCount
FROM Document s
LEFT JOIN Staging b ON s.DocumentId = b.DocumentId
WHERE b.DocumentId IS NULL
AND s.SubmitDateTime IS NOT NULL
AND s.InsertDateTime IS NOT NULL
AND s.DocumentName IS NOT NULL
) t
WHERE t.DocCount > 1
ORDER BY t.DocumentName, t.SubmitDateTime;