Search code examples
sqlpostgresqlcase-insensitivededuplication

postgresql: Finding the ids of rows that contain case-insensitive string duplication


I want to select and then delete a list of entries in my tables that have case-insensitive duplications.

In other words, there are these rows that are unique... ..but they're not unique if you ignore case factor in case. They got in while I wasn't watching.

So how can I select against the column to find the ids that I should delete? (I'm fine with deleting both duplications).

simple sample column structure:

player_id | uname
------------------
34        | BOB
544       | bob
etc...

Solution

  • Players to keep (assuming they registered first)

    SELECT min(player_id) as player_id
    FROM players
    GROUP BY lower(uname)
    

    Use it to dislay the users to remove and their corresponding keeper.

    SELECT 
        players.player_id delete_id,
        players.uname delete_uname,
        keepers.uname keeper_uname,
        keepers.player_id keeper_id    
    FROM players JOIN 
        (
            SELECT p.player_id, p.uname
            FROM players p JOIN
            (
                SELECT min(player_id) player_id
                  FROM players
              GROUP BY lower(uname)
            ) as keeper_ids
            ON (p.player_id = keeper_ids.player_id)     
        ) as keepers
        ON (lower(players.uname) = lower(keepers.uname) AND players.player_id <> keepers.player_id)
    ORDER BY keepers.player_id, players.player_id 
    

    Output:

    delete_id | delete_uname | keeper_uname | keeper_id
    ---------------------------------------------------
    544       | bob          | BOB          | 34