Search code examples
sqldatabasepostgresqlquery-optimizationcase-insensitive

Remove case insensitive duplicates in sql (postgres)


I have a postgresql database, and I'm trying to delete (or even just get the ids) of the older of the duplicates I have in my table, but only those who are because of case sensitivity, for example helLo and hello.

The table is quite large and my nested query takes a really long time, I wonder if there is a better, more efficient way to do my query in one go, and not split it up to multiple queries, cause there's a lot of ids in question

SELECT * FROM some_table AS out
WHERE (SELECT count(*) FROM some_table AS in
    WHERE out.text != in.text 
    AND LOWER(in.text) = LOWER(out.text) 
    AND in.created_at > out.created_at) > 1

Thanks!


Solution

  • Can you try

    SELECT LOWER(text), ROW_NUMBER() OVER( PARTITION by LOWER(text) ORDER by created_at ) as rn
    FROM some_table 
    

    You can then use the rn column as a filter