Search code examples
sqlamazon-redshiftgreatest-n-per-group

SQL Redshift - Select rows from similar rows in table that meet a certain criteria


event_id alert_id alert_timestamp value
1 x 2022-10-10 -2
1 x 2022-10-10 4
1 x 2022-10-10 5
2 z 2022-09-02 3

I have a table that has rows that match on event_id, alert_id and alert_timestamp. I want to only keep the single row that has the smallest value in the value column where all other rows are matching on event_id, alert_id and alert_timestamp.

NOTE: this is just one example of matching rows, the table has a mix of different rows than the example given, that match on event_id, alert_id and alert_timestamp

Also if there are single rows i.e. a row that has no other row that matches on event_id, alert_id and alert_timestamp, this should be kept as is.


Solution

  • That's a typical top-1-per-group problem. In Redshift you can use row_number() to solve it:

    select event_id, alert_id, alert_timestamp, value
    from (
        select t.*, 
            row_number() over(partition by event_id, alert_id, alert_timestamp order by value) rn
        from mytable t
    ) t
    where rn = 1
    

    Note that if you really had only 4 columns, then aggregation would be sufficient:

    select event_id, alert_id, alert_timestamp, min(value) as value
    from mytable
    group by event_id, alert_id, alert_timestamp, value