Search code examples
mysqlgroup-bymaxrdbmspartition-by

MySQL query to update all "duplicate" rows except the latest one


SO...

I have a table (let's call it Data Location) like...

Data Location ID    Data ID    Location Type    Location URL   Status    Date
1                   1          Foo              foo/3          Valid     10-21-2014
2                   1          Bar              bar/1          Valid     10-21-2014
3                   1          Foo              foo/2          Valid     03-20-2013
4                   1          Foo              foo/1          Valid     12-01-2010

...and so on. This table has many different Data IDs (I didn't show them for simplicity). The idea is that there should only be 1 Valid entry for a given Data ID and Location Type and as you can see above, all of Foo locations for Data ID 1 are all valid (foo/1, foo/2, foo/3).

Can someone please help me construct a query to update all of the duplicate records' (same Data ID and Location Type) Status column to Invalid EXCEPT that latest entry. I have a query that can identify the rows that fit the duplicate criteria, but I am unsure of how to combine group by with max (or not max?) and update. It seems like partition by may be the way, but I am a bit rusty with queries, so I appreciate any help. So for the data above, I would expect the result to be...

Data Location ID    Data ID    Location Type    Location URL   Status    Date
1                   1          Foo              foo/3          Valid     10-21-2014
2                   1          Bar              bar/1          Valid     10-21-2014
3                   1          Foo              foo/2          Invalid   03-20-2013
4                   1          Foo              foo/1          Invalid   12-01-2010

...thanks in advance!


Solution

  • You can use one UPDATE statement:

    UPDATE Data_Location u
    INNER JOIN (
      SELECT `Data ID`, `Location Type`, MAX(`Date`) AS max_date
      FROM Data_Location
      GROUP BY `Data ID`, `Location Type`
    ) t ON u.`Data ID` = t.`Data ID` 
       AND u.`Location Type` = t.`Location Type`
    SET u.Status = 'Invalid'
    WHERE u.`Date` <> t.max_date
    

    Test on SQL Fiddle