Keep newest duplicate row depending on multiple Columns

I seem to have a workflow problem with Open Refine (Google Refine 2.5 [r2407]) to do sophisticated duplicate row cleaning. All I have found so far is how to delete duplicate rows based on a single column.

My aim is to delete duplicate rows based on multiple columns, at best, in a specific hierarchy.

Example

Given the following dummy data in Refine

+----+---------+---------+--------+------------+------+-----------------------------------+
| id | timeAgo |  title  | author |    date    | val1 |    [After Refine, keep Record]    |
+----+---------+---------+--------+------------+------+-----------------------------------+
|  1 |      10 | Faust   | Mr. A  | 2014-01-15 |   10 | ->B, older entry                  |
|  2 |      11 | Faust   | Mr. A  | 2014-01-21 |   10 | A (because of Date)               |
|  3 |       8 | Faust   | Mr. A  | 2014-01-15 |   10 | B                                 |
|  4 |       8 | RedHead | Mr. B  | 2014-01-21 |   34 | ->D, older entry                  |
|  5 |       7 | RedHead | Mr. B  | 2014-01-21 |   34 | ->D, same time Ago, but lower ID  |
|  6 |       7 | RedHead | Mr. A  | 2014-01-01 |   13 | C (because of author, date, val1) |
|  7 |       7 | RedHead | Mr. B  | 2014-01-21 |   34 | D                                 |
+----+---------+---------+--------+------------+------+-----------------------------------+

I want to kill the duplicate rows based on following logic. If

title && auther && date && val1 are the same, than
keep the newest (least timeAgo) row, if there are multiple, than
keep the one with the highest id

The Result would be:

+---------+----+---------+---------+--------+------------+------+
| Refined | id | timeAgo |  title  | author |    date    | val1 |
+---------+----+---------+---------+--------+------------+------+
| A       |  2 |      10 | Faust   | Mr. A  | 2014-01-21 |   10 |
| B       |  3 |       8 | Faust   | Mr. A  | 2014-01-15 |   10 |
| C       |  6 |       7 | RedHead | Mr. A  | 2014-01-01 |   13 |
| D       |  7 |       7 | RedHead | Mr. B  | 2014-01-21 |   34 |
+---------+----+---------+---------+--------+------------+------+

Easy Approach?

If there is no other solution, I thankfully take a scripting/GREL one.

But could it be done by Refines famous workflow "recording" to achieve above logic, so it could be extracted and applied to other same format datasets?

My motivation behind this is to enable employees to work more thoughtfully with data (beyond excel) but without confronting them right away with a full blown scripting language.

Solution

That sounds like a straightforward sorting problem.

Sort the records by title, author, time ago, and ID
Re-order rows permanently (IMPORTANT - it won't work if you forget this step)
Blank down on Title & Author
Move those two columns to the two left most positions
Join multivalued cells on remaining columns
Transform all columns from step 5 using value.split(',')[0] to extract the first value (which should be the value for the record you want if you sorted them in the right order