Search code examples
phpmysqlsql-deletepartial-matches

MYSQL how to delete duplicates based on match after a hyphen


I have a list of names in my table as follows:

paul-jones
Ron-Thomas
John-Doe
Michael-Adams
Jim-Jones
Adam-Jones
Bob-Thomas
Bill-Thomas

I would like to delete all rows where last name after hypen is duplicate but would like to keep oldest row for each last name. For example in the above data set I would like to delete

    Jim-Jones
    Adam-Jones
    Bob-Thomas
    Bill-Thomas

and keep

paul-jones
Ron-Thomas
John-Doe
Michael-Adams 

and so on.

I found the below query to extract all charecter after hyphen like this select SUBSTRING_INDEX(full_name,'-',-1) from names; but couldn't figure out how to update last_name column in my table with these values, so that I can delete based on unique values in my last_name column using this query maybe

("ALTER IGNORE TABLE names ADD UNIQUE (`last_name`)") ;

I hope this clearly explains my question. Thanks for the help.


Solution

  • You can use the SUBSTRING_INDEX in your UPDATE query to update last_name.

    UPDATE product
    SET last_name = SUBSTRING_INDEX(supplier_reference,'-',-1)
    WHERE SUBSTRING_INDEX(supplier_reference,'-',-1) != ''
    AND last_name = '';
    

    WHERE clause will ensure that the query updates last name only if there is a value after hyphen.

    And then, if you want to delete duplicate rows but retain oldest one based on last_name,

    DELETE FROM product
    WHERE id NOT IN (
        SELECT MIN(id)
        FROM product
        GROUP BY last_name
    )
    

    If you have auto-incremented value id then MIN(id) will ensure oldest record is retained. To test, instead of DELETE try SELECT * FROM product and verify if these are the records you want deleted.

    Please note, this query will also delete duplicate rows where last_name is empty. If you don't want that, then add a clause WHERE last_name != '' in the subquery.