Search code examples
mysqlsqldeduplication

Mysql deduplicate records in single query


I have the following table:

CREATE TABLE `relations` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `relationcode` varchar(25) DEFAULT NULL,
  `email_address` varchar(100) DEFAULT NULL,
  `firstname` varchar(100) DEFAULT NULL,
  `latname` varchar(100) DEFAULT NULL,
  `last_contact_date` varchar(25) DEFAULT NULL,
  PRIMARY KEY (`id`)
)

In this table there are duplicates, these are relation with exact the same relationcode and email_address. They can be in there twice or even 10 times. I need a query that selects the id's of all records, but excludes the ones that are in there more than once. Of those records, I only would like to select the record with the most recent last_contact_id only.

I'm more into Oracle than Mysql, In Oracle I would be able to do it this way:

select * from (
    select row_number () over (partition by relationcode order by to_date(last_contact_date,'dd-mm-yyyy')) rank, 
           id, 
           relationcode,  
           email_address , 
           last_contact_date
    from RELATIONS)
where rank = 1

But I can't figure out how to modify this query to work in MySql. I'm not even dure it's possible to do the same thing in a single query in MySQl. Any ideas?


Solution

  • Here is a method that will work in both MySQL and Oracle. It rephrases the question as: Get me all rows from relations where the relationcode has no larger last_contact_date.

    It works something like this:

    select r.*
    from relations r
    where not exists (select 1
                      from relations r2
                      where r2.relationcode = r.relationcode and
                            r2.last_contact_date > r.last_contact_date
                     );
    

    With the appropriate indexes, this should be pretty efficient in both databases.

    Note: This assumes that last_contact_date is stored as a date not as a string (as in your table example). Storing dates as strings is just a really bad idea and you should fix your data structure