Search code examples
amazon-redshiftpartitioningmin

redshift sql row selection based on min date


I need to extract all rows from a large data set with 30+ columnns based on the min time date, account can repeat across days and execution dates, execution id is distinct and can be thought of as a transaction.

account execution_id execution_datetime execution_date
1 918-46 2023-04-03 15:59:09 2023-04-03
1 740-a2 2023-04-03 15:56:09 2023-04-03
1 747-9b 2023-04-03 15:54:09 2023-04-03
2 2bb-14 2023-04-03 15:54:09 2023-04-03
1 818-a5 2023-04-04 15:47:08 2023-04-04

Ordinarily something like this would work

select omd.*
 from db1 as omd
 inner join (
              select id, min(execution_datetime)
              from db1
              group by execution_date,account,id
            ) as omd1 
                on omd.id = omd1.id; 

This is returning all records vs expected outcome.

account execution_id execution_datetime execution_date
1 747-9b 2023-04-03 15:54:09 2023-04-03
2 2bb-14 2023-04-03 15:54:09 2023-04-03
1 818-a5 2023-04-04 15:47:08 2023-04-04

I'm also can envision there will be a follow-up request to group the transactions across a date range of X so that min transactions across X days will be grouped,and the output might look like this.

account execution_id execution_datetime execution_date
1 747-9b 2023-04-03 15:54:09 2023-04-03
2 2bb-14 2023-04-03 15:54:09 2023-04-03

Looking for a solution that will hopefully support either scenario, thank you.


Solution

  • This is a job for the window function row_number().

    select * 
    from (
      select *,
        row_number() over(partition by execution_date, account,id order by 
      execution_datetime) as row_num
      from db1 )
    where row_num = 1;
    

    You may want to suppress row_numb from the output and to do this you will need list list all 30 columns that you want.

    (sorry if there are any typos)