When I load a SAS dataset sorted by a key(contains duplicate keys) using PROC SORT into Redshift, the same sorting order is not preserved in Redshift. For eg. When I compare the SAS dataset with Redshift table, they are not in the same sorting order, the data is really jumbled within the duplicates keys and very hard to find a pattern of how data stores in Redshift columns or why that's happening.
]1
Any suggestions or guidance on maintaining the same sorting after data load into Redshift from SAS would be very helpful. Thanks!
The code used:
PROC APPEND BASE = Target_Table (bulkload=yes bl_compress=yes bl_bucket='xxx' bl_default_dir = 'xxx' bl_use_escape =YES)
DATA = Source_Table force;
RUN;
You need to set the sortkey
when creating the table, eg:
proc sql;
connect using myredlib as rs;
execute (create table iwant (
key INTEGER
,data char(1)
,load_dttm TIMESTAMP
)
distkey(key)
compound sortkey(key,data) )by rs;
If you have duplicates or you just need to preserve the source data order then create a new column (eg id
) in a data step prior to the append, eg:
data iwant;
set ihave;
id=_n_;
run;
proc append .....