Search code examples
sqlitegroup-bynullduplicatessql-delete

SQL - delete duplicates with null values WITHOUT row_number


I am using SQLite and I have the following table, x, which contains about 300k rows, here is a sample:

name surname nickname
Jeniffer Doe Jenny
Jeniffer Doe NULL
Jeniffer Doe Jenny

Each row has Name, Surname and Nickname values, where Nickname can also be NULL. None of the values are unique and there is also no key. What I want to do is delete the "duplicates" where Nickname is NULL, while grouping them by the Name-Surname pair but also delete the "actual" duplicates.

This would be the expected output:

name surname nickname
Jeniffer Doe Jenny

I have tried using the following query:

select x.* 
from (select x.*, 
          row_number() over (partition by name, surname order by nickname nulls last) as seqnum
     from x)
     ) x
where seqnum = 1;

but unfortunately I get an error:

near "(": syntax error

I have searched for days now looking for an answer and I still cannot figure it out. I am using SQLite3, if it matters. I still am not sure why is there syntax error, I assume it has something to do with the row_number(), but according to my search on the web, it should be supported with sqlite3 (In my case it is not as far as I understood it)

So now I am looking for a way to modify this query in order to get the desired output, but I am clueless for days now...

Any help would be greatly appreciated!


Solution

  • Unless you have created the table WITHOUT ROWID, there is a column rowid in the table that acts like the primary key.

    You can use aggregation to get the min rowid of each combination of name and surname that should not be deleted:

    DELETE FROM tablename
    WHERE rowid NOT IN (
      SELECT COALESCE(
               MIN(CASE WHEN nickname IS NOT NULL THEN rowid END),
               MIN(rowid)
             )  
      FROM tablename 
      GROUP BY name, surname
    );
    

    See the demo.