Search code examples
sqlsortingamazon-redshiftdistinct

Redshift: Sorting order breaks when used with DISTINCT clause


I have data like below:

select study_id , updated_by ,created_at 
from my_table ps 
where study_id = '1';

enter image description here

I want sort the records in Descending order based on created_at and pick distinct study_id and updated_by.

I tried below:

I've ran into a weird issue here in Redshift. Please consider the below query:

select study_id , updated_by 
from my_table ps 
where study_id = '1' 
ORDER BY created_at DESC ;

This results in:

enter image description here

But I need to pick only Distinct records. So, I used this query:

select DISTINCT study_id , updated_by 
from my_table ps
where study_id = '1' 
ORDER BY created_at DESC ;

This results in:

enter image description here

As you can see, the record with maya2 is now being shown as latest instead of maya1.

Why does the sorting breaks with DISTINCT? How can I fix this?


Solution

  • so in your data set only maya1 is duplicated and maya1 holding the timestamp 7:31 and 7:29 but maya2 is holding 7:30 , so when you are using distinct then query engine remove one maya1 which is holding 7:31 as a result maya2 is getting the top posintion

    if you need latest then simply use max

    select study_id , updated_by ,max(created_at) as created_at
    from my_table ps
    where study_id = '1' 
    group by study_id , updated_by 
    ORDER BY created_at DESC 
    

    if you need only study_id , updated_by then use row_number()

    select  select study_id , updated_by from
    ( select study_id , updated_by ,
       row_number() over(partition by study_id , updated_by     ORDER BY created_at DESC ) rn
        from my_table ps
        where study_id = '1' 
     ) a where a.rn=1