I saw the following posted on a basic way to de-dup entries, without explanation of how it works. I see that it works, but I want to know the workings of how it works and the process in which it evaluates. Below I will post the code, and my thoughts. I am hoping that somebody can tell me if my thought process on how this is evaluated step by step is correct, or if I am off, can somebody please break it down for me.
CREATE TABLE #DuplicateRcordTable (Col1 INT, Col2 INT)
INSERT INTO #DuplicateRcordTable
SELECT 1, 1
UNION ALL
SELECT 1, 1
UNION ALL
SELECT 1, 1
UNION ALL
SELECT 1, 2
UNION ALL
SELECT 1, 2
UNION ALL
SELECT 1, 3
UNION ALL
SELECT 1, 4
GO
This returns a basic table:
Then this code is used to exclude duplicates:
SELECT col1,col2
FROM #DuplicateRcordTable
EXCEPT
SELECT col1,col2
FROM #DuplicateRcordTable WHERE 1=0
My understanding is that where 1=0 creates a "temp" table structured the same but has no data.
Does this code then start adding data to the new empty table?
For example does it look at the first Col1, Col2 pair of 1,1 and say "I don't see it in the table" so it adds it to the "temp" table and end result, then checks the next row which is also 1,1 and then sees it already in the "temp" table so its not added to the end result....and so on through the data.
The reason that this works is due to the definition of EXCEPT
which according to the MS docs is
EXCEPT returns distinct rows from the left input query that aren't output by the right input query.
The key word here being distinct
. Putting where 1 = 0
makes the second query return no results, but the EXCEPT
operator itself then reduces the rows from the left query down to those which are distinct
.
As @Gordon Linoff says in his answer, there is a simpler, more straightforward way to accomplish this.
The fact that the example uses the same table in the left and right queries could be misleading, the following query will accomplish the same thing, so long as the values in the right query don't exist in the left:
SELECT col1, col2
FROM @DuplicateRecordTable
EXCEPT
SELECT -1, -1