Search code examples
sqlamazon-redshiftamazon-redshift-spectrumsql-rank

Redshift get rows with highest priority and timestamp


I have a table in redshift with records as follows


+====+======+=========+============+
| Id | Book | Action  | Timestamp  |
+====+======+=========+============+
| 1  | ABC  | ADDED   | 1111111111 |
+----+------+---------+------------+
| 1  | ABC  | REMOVED | 2222222222 |
+----+------+---------+------------+
| 1  | ABC  | CHECKED | 3333333333 |
+----+------+---------+------------+
| 2  | XYZ  | ADDED   | 4444444444 |
+----+------+---------+------------+
| 2  | XYZ  | CHECKED | 5555555555 |
+----+------+---------+------------+
| 5  | DEF  | CHECKED | 6666666666 |
+----+------+---------+------------+
| 5  | DEF  | CHECKED | 7777777777 |
+====+======+=========+============+

What I want is to get single record for each book. In case of duplicates, prioritize by Action with CHECKED having the lowest rank, rest all actions will have same priority and then pick the most recent record.

Expected Output


+====+======+=========+============+
| Id | Book | Action  | Timestamp  |
+====+======+=========+============+
| 1  | ABC  | REMOVED | 2222222222 |
+----+------+---------+------------+
| 2  | XYZ  | ADDED   | 4444444444 |
+----+------+---------+------------+
| 5  | DEF  | CHECKED | 7777777777 |
+====+======+=========+============+

Solution

  • A simple method uses row_number():

    select t.*
    from (select t.*,
                 row_number() over (partition by id
                                    order by (case action when 'Checked' then 1 else 2 end) desc,
                                             timestamp desc
                                   ) as seqnum
          from t
         ) t
    where seqnum = 1;