Search code examples
sasin-memory

SAS Viya 3.4 Manipulation of a table in memory


I will give what I want to implement with an example below. After this I will give you my two approches. I am working on a 3.4 SAS Viya Platform.

EXAMPLE: I have a table (MYTABLE) and this table is promoted on a global caslib (CKCAS). This table contains 10 rows and 5 columns.

MYTABLE

column1 column2 column3 column4 date
aaa 4567 gtt 44 20210201
aa 5535 faas 44 20210202
fd 23 axv 44 20210203
sd 736 azxq 44 20210204
ghy 9008 feet 44 20210205
lk 3339 wqopp 44 20210206
yj 112 poo 44 20210207
trr 3634 piuy 44 20210208
hrfthr 689 iuyt 44 20210209
rt 2345 uio 44 20210210

The client asked from me to delete a few rows from the table. His goal here is to retain the latest (by column 'date') 5 days. Below is the "desired" table:

column1 column2 column3 column4 date
lk 339 wqopp 44 20210206
yj 112 poo 44 20210207
try 3634 piuy 44 20210208
hrfthr 689 iuyt 44 20210209
rt 2345 uio 44 20210210

IMPORTANT! The table needs to be promoted and accessible from all sessions! Right now, there is a job every day that collects data for the client and append them on MYTABLE. This implementation will not change!

APPROCHE 1

cas christos;
libname KAPPA cas caslib="CKCAS";
 
proc sql;
delete from KAPPA.MYTABLE
where date<20210206;
quit;
cas christos terminate;

If I execute the code below, will I achieve the goal table? Will the table MYTABLE be promoted after the deletion of the rows?

End of APPROCHE 1

APPROCHE 2 The second approche that I thought is to save the table for memory to hard drive of the server. Procced with the deletion there and then to promote the table again in memory. With code:

cas christos;
caslib _all_ assign;
casdatalimit=ALL;
libname pathtohd '/tmp/tempo_directory/';
libname KAPPA cas caslib="CKCAS";

data pathtohd.MYTABLE;
set KAPPA.MYTABLE;
run;

proc sql;
delete from pathtohd.MYTABLE
where date<20210206;
quit;
 
proc casutil;
load data=pathtohd.MYTABLE casout="MYTABLE" outcaslib="CKCAS" replace promote;
run;

cas christos terminate;

end of APPROCHE 2

My clients' tables are containing hundredths of millions of rows and the most important is that the tables are on a Production Environment. I think this implementation needs to be in memory. Do you have any better idea? Will any of my approches work? **This implementation needs to be on over 100 tables + these tables updating from SAS jobs daily. So, my second approche seems not the most appropriate.

Thnaks in advance


Solution

  • It's best to use CAS actions for this; however, the table.deleteRows action was not added until Viya 3.5. Promoted tables were originally meant to be basically immutable: when a table is up and promoted in CAS for everyone, it generally should only be appended to with good data. Bad data, of course, gets in to production systems sometimes and it needs to be modified.

    Since you need to delete rows, the safest way would be to create a copy of it in CASUSER, drop the old table, then promote the updated one. It's likely that their CAS cluster has more than enough memory to do this.

    Double-check if it's partitioned or ordered a specific way before doing this. You can add the partition and order statements to your dataset options. If you need to save the table to persistent storage, use the save statement in proc casutil as well.

    With this method, all changes are done only in CAS.

    data causer.mytable;
        set caslib.mytable;
        where date < '06FEB2021'd;
    run;
    
    proc casutil;
        droptable casdata="mytable" incaslib="caslib" outcaslib="caslib";
        promote casdata="mytable" incaslib="casuser" outcaslib="caslib";
    run;