I will give what I want to implement with an example below. After this I will give you my two approches. I am working on a 3.4 SAS Viya Platform.
EXAMPLE: I have a table (MYTABLE) and this table is promoted on a global caslib (CKCAS). This table contains 10 rows and 5 columns.
MYTABLE
column1 | column2 | column3 | column4 | date |
---|---|---|---|---|
aaa | 4567 | gtt | 44 | 20210201 |
aa | 5535 | faas | 44 | 20210202 |
fd | 23 | axv | 44 | 20210203 |
sd | 736 | azxq | 44 | 20210204 |
ghy | 9008 | feet | 44 | 20210205 |
lk | 3339 | wqopp | 44 | 20210206 |
yj | 112 | poo | 44 | 20210207 |
trr | 3634 | piuy | 44 | 20210208 |
hrfthr | 689 | iuyt | 44 | 20210209 |
rt | 2345 | uio | 44 | 20210210 |
The client asked from me to delete a few rows from the table. His goal here is to retain the latest (by column 'date') 5 days. Below is the "desired" table:
column1 | column2 | column3 | column4 | date |
---|---|---|---|---|
lk | 339 | wqopp | 44 | 20210206 |
yj | 112 | poo | 44 | 20210207 |
try | 3634 | piuy | 44 | 20210208 |
hrfthr | 689 | iuyt | 44 | 20210209 |
rt | 2345 | uio | 44 | 20210210 |
IMPORTANT! The table needs to be promoted and accessible from all sessions! Right now, there is a job every day that collects data for the client and append them on MYTABLE. This implementation will not change!
APPROCHE 1
cas christos;
libname KAPPA cas caslib="CKCAS";
proc sql;
delete from KAPPA.MYTABLE
where date<20210206;
quit;
cas christos terminate;
If I execute the code below, will I achieve the goal table? Will the table MYTABLE be promoted after the deletion of the rows?
End of APPROCHE 1
APPROCHE 2 The second approche that I thought is to save the table for memory to hard drive of the server. Procced with the deletion there and then to promote the table again in memory. With code:
cas christos;
caslib _all_ assign;
casdatalimit=ALL;
libname pathtohd '/tmp/tempo_directory/';
libname KAPPA cas caslib="CKCAS";
data pathtohd.MYTABLE;
set KAPPA.MYTABLE;
run;
proc sql;
delete from pathtohd.MYTABLE
where date<20210206;
quit;
proc casutil;
load data=pathtohd.MYTABLE casout="MYTABLE" outcaslib="CKCAS" replace promote;
run;
cas christos terminate;
end of APPROCHE 2
My clients' tables are containing hundredths of millions of rows and the most important is that the tables are on a Production Environment. I think this implementation needs to be in memory. Do you have any better idea? Will any of my approches work? **This implementation needs to be on over 100 tables + these tables updating from SAS jobs daily. So, my second approche seems not the most appropriate.
Thnaks in advance
It's best to use CAS actions for this; however, the table.deleteRows
action was not added until Viya 3.5. Promoted tables were originally meant to be basically immutable: when a table is up and promoted in CAS for everyone, it generally should only be appended to with good data. Bad data, of course, gets in to production systems sometimes and it needs to be modified.
Since you need to delete rows, the safest way would be to create a copy of it in CASUSER
, drop the old table, then promote the updated one. It's likely that their CAS cluster has more than enough memory to do this.
Double-check if it's partitioned or ordered a specific way before doing this. You can add the partition and order statements to your dataset options. If you need to save the table to persistent storage, use the save
statement in proc casutil
as well.
With this method, all changes are done only in CAS.
data causer.mytable;
set caslib.mytable;
where date < '06FEB2021'd;
run;
proc casutil;
droptable casdata="mytable" incaslib="caslib" outcaslib="caslib";
promote casdata="mytable" incaslib="casuser" outcaslib="caslib";
run;