Search code examples
sqlsql-servert-sqlsql-server-2000no-duplicates

Selecting with preference in SQL Server


I have a table in SQL Server 2000 with data similar to the following:

ReferenceNumber    ReferenceValue
00001              Not assigned
00002              Not assigned
00002              ABCDE

in which each ReferenceNumber can appear multiple times in the table, either with a ReferenceValue of 'Not assigned' or a true ReferenceValue.

I want to dump the data into a cleaned-up table with only one row per ReferenceNumber and a true ReferenceValue if it exists, or 'Not assigned' if there are no true ReferenceValues.

I can see how to do it with two queries:

SELECT TOP 1 ReferenceNumber, ReferenceValue
INTO clean
FROM duplicates
WHERE ReferenceValue <> 'Not assigned'

INSERT INTO clean(ReferenceNumber, ReferenceValue)
SELECT TOP 1 ReferenceNumber, ReferenceValue
WHERE ReferenceValue = 'Not assigned' 
AND ReferenceNumber NOT IN (SELECT ReferenceNumber FROM clean)

but I'm thinking there must be a better way. Any ideas?


Solution

  • Something like this:

    SELECT 
      ReferenceNumber
    , ReferenceValue = ISNULL(MAX(NULLIF(ReferenceValue,'Not assigned')),'Not assigned')
    INTO Table1_Clean
    FROM Table1
    GROUP BY
      ReferenceNumber
    

    MAX() ignores NULLs, so convert whatever you don't want to NULL first, then MAX(), then convert NULLs back to a dummy value.

    One pass, in-line, can't get much more efficient.