I am using SQLite and I have the following table, x, which contains about 300k rows, here is a sample:
name | surname | nickname |
---|---|---|
Jeniffer | Doe | Jenny |
Jeniffer | Doe | NULL |
Jeniffer | Doe | Jenny |
Each row has Name, Surname and Nickname values, where Nickname can also be NULL. None of the values are unique and there is also no key. What I want to do is delete the "duplicates" where Nickname is NULL, while grouping them by the Name-Surname pair but also delete the "actual" duplicates.
This would be the expected output:
name | surname | nickname |
---|---|---|
Jeniffer | Doe | Jenny |
I have tried using the following query:
select x.*
from (select x.*,
row_number() over (partition by name, surname order by nickname nulls last) as seqnum
from x)
) x
where seqnum = 1;
but unfortunately I get an error:
near "(": syntax error
I have searched for days now looking for an answer and I still cannot figure it out. I am using SQLite3, if it matters. I still am not sure why is there syntax error, I assume it has something to do with the row_number()
, but according to my search on the web, it should be supported with sqlite3 (In my case it is not as far as I understood it)
So now I am looking for a way to modify this query in order to get the desired output, but I am clueless for days now...
Any help would be greatly appreciated!
Unless you have created the table WITHOUT ROWID, there is a column rowid
in the table that acts like the primary key.
You can use aggregation to get the min rowid
of each combination of name
and surname
that should not be deleted:
DELETE FROM tablename
WHERE rowid NOT IN (
SELECT COALESCE(
MIN(CASE WHEN nickname IS NOT NULL THEN rowid END),
MIN(rowid)
)
FROM tablename
GROUP BY name, surname
);
See the demo.