Search code examples
sortingsasamazon-redshiftcolumnsorting

Redshift tables are not preserving the SAS sort order after loading the data into Redshift


When I load a SAS dataset sorted by a key(contains duplicate keys) using PROC SORT into Redshift, the same sorting order is not preserved in Redshift. For eg. When I compare the SAS dataset with Redshift table, they are not in the same sorting order, the data is really jumbled within the duplicates keys and very hard to find a pattern of how data stores in Redshift columns or why that's happening.

SAS dataset vs Redshift table sort order]1

Any suggestions or guidance on maintaining the same sorting after data load into Redshift from SAS would be very helpful. Thanks!

The code used:

PROC APPEND BASE = Target_Table (bulkload=yes bl_compress=yes bl_bucket='xxx' bl_default_dir = 'xxx' bl_use_escape =YES) 
DATA = Source_Table force; 
RUN;

Solution

  • You need to set the sortkey when creating the table, eg:

    proc sql;
    connect using myredlib as rs;
    execute (create table iwant (
       key INTEGER
      ,data char(1)
      ,load_dttm TIMESTAMP
    )
    distkey(key)
    compound sortkey(key,data) )by rs;
    

    If you have duplicates or you just need to preserve the source data order then create a new column (eg id) in a data step prior to the append, eg:

    data iwant;
      set ihave;
      id=_n_;
    run;
    proc append .....