Search code examples
sqlduplicatessql-delete

Removing Partial Duplicates Ignoring Nulls in SQL


I have a table with two possible unique identifiers (ID1 and ID2). Each row will have one or both of these identifiers. The data in each row is exactly the same for each ID, with the exception of a timestamp. I would like to eliminate duplicates from each value, but treat nulls as unique values.

This question: How to delete duplicate rows in sql server?

Referred me to this site: http://www.codaffection.com/sql-server-article/delete-duplicate-rows-in-sql-server/

where I came up with the following query:

WITH CTE AS
(
SELECT *,ROW_NUMBER() OVER (PARTITION BY ID1 ORDER BY ID1) AS RN
FROM Filings_Search
)

DELETE FROM CTE WHERE RN<>1

Unfortunately, this deleted all my null values as well! How can I modify this query to avoid deleting null values?

Edit: Here is a sample of what my data would look like (if anyone knows how to format tables well, let me know. I used https://senseful.github.io/text-table/).

+------+------+----------+-----------+
| ID1  | ID2  |   Data   | Timestamp |
+------+------+----------+-----------+
| NULL | abc  | macd     | 01:40     |
| NULL | abc  | macd     | 04:23     |
| NULL | def  | pfchangs | 01:41     |
| 123  | NULL | wendys   | 02:42     |
| 123  | NULL | wendys   | 03:45     |
+------+------+----------+-----------+

Running on ID1 would output:

+------+------+----------+-----------+
| ID1  | ID2  |   Data   | Timestamp |
+------+------+----------+-----------+
| NULL | abc  | macd     | 01:40     |
| NULL | abc  | macd     | 04:23     |
| NULL | def  | pfchangs | 01:41     |
| 123  | NULL | wendys   | 02:42     |
+------+------+----------+-----------+

Running on ID2 would output:

+------+------+----------+-----------+
| ID1  | ID2  |   Data   | Timestamp |
+------+------+----------+-----------+
| NULL | abc  | macd     | 01:40     |
| NULL | def  | pfchangs | 01:41     |
| 123  | NULL | wendys   | 02:42     |
| 123  | NULL | wendys   | 03:45     |
+------+------+----------+-----------+

Apologies if this is a duplicate, I am a SQL beginner and couldn't find anything exactly like what I was looking for.


Solution

  • what about:

     DELETE FROM CTE 
     WHERE RN<>1
       AND ID1 IS NOT NULL