Search code examples
mysqlquery-performance

MYSQL drop duplicates of userid


I thought I'd made the column userid in my table "userslive" unique, but somehow must have made a mistake. I've seen multiple answers to this question, but I'm afraid of messing up again so I hope someone can help me directly.

So this table has no unique columns, but I've got a column "timer" which was the timestamp of scraping the data. If possible I'd like to drop rows with the lowest "timer" with duplicate "userid" column.

It's a fairly big table at about 2 million rows (20 columns). There is about 1000 duplicate userid which I've found using this query:

SELECT userid, COUNT(userid) as cnt FROM userslive GROUP BY userid HAVING (cnt > 1);

Is this the correct syntax? I tried this on a backup table, but I suspect this is too heavy for a table this big (unless left to run for a very long time.

  DELETE FROM userslive using userslive,
      userslive e1
  where userslive.timer < e1.timer
      and userslive.userid = e1.userid  

Is there a quicker way to do this?

EDIT: I should say the "timer" is not a unique column.


Solution

  • DELETE t1.*     /* delete from a copy named t1 only */
    FROM userslive t1, userslive t2
    WHERE t1.userid = t2.userid  
      AND t1.timer < t2.timer
    

    fiddle

    Logic: if for some record (in a copy aliased as t1) we can find a record (in a table copy aliased as t2) with the same user but with greater/later timer value - this record must be deleted.