I'm trying to automatically create some integrity constraints based on this dataset:
ds_name | var_name | ic_clause | ic_msg
--------+----------+------------------------------------+-----------------------
tableA | var1 | primary key($var$) | $var$ is a primary key
tableB | var2 | check(where=($var$ in ('a', 'b'))) | invalid $var$ value
The idea is to create a generic program that loops through this dataset and creates the ICs accordingly. In this specific case, the equivalent hard coded program would be:
proc datasets nolist;
modify tableA;
ic create primary key(var1)
message = "var1 is a primary key";
quit;
proc datasets nolist;
modify tableB;
ic create check(where=(var2 in ('a', 'b')))
message = "invalid var2 value";
quit;
These are the steps I imagine in the program, but I need help to translate them into actual code:
Run a generic proc datasets, eg.:
proc datasets nolist;
modify &my_ds;
ic create &my_clause
message = &my_msg;
quit;
Can anyone please help me with this code? I don't know whether the steps I've suggested are the best way to implement what I'm trying to do. Basically I'm trying to simulate a relational database within SAS, and to automate things the maximum possible.
Thank you!
You probably will find that you cannot turn SAS into a DBMS. It might be better to use your metadata to generate programs that check the data instead of trying to implement integrity constraints.
But the concept of data driven code generation is interesting one so lets see if we can use your example to demonstrate how to generate code from metadata. I find that it works better when you match the variable names in the metadata to the code that needs to be generated. So lets call the variable that is used to create the MESSAGE=
option on the IC statement MESSAGE
.
Now we can use a simple data step to generate the code. Not sure why you used pseudo code in the constraint and message fields instead of just hard coding the values, but we can use TRANWRD()
function to replace the $varname$
strings with the value of the VARNAME
variable.
So let's make a sample metadata file.
data ic_metadata;
infile datalines dlm="|";
length libname $8 memname $32 varname $32 constraint message $200;
input libname memname varname constraint message ;
datalines;
work|tableA|var1|primary key($varname$) |$varname$ is a primary key
work|tableB|var2|check(where=($varname$ in ('a', 'b')))|invalid $varname$ value
;
And some sample data to work on.
data tablea tableb ;
length var1 8 var2 $8 ;
var1+1;
var2='a';
run;
Now let's use the metadata to generate the code and %INCLUDE
to run it.
filename code temp;
data _null_;
file code ;
set ic_metadata ;
by libname memname ;
if first.libname then put 'proc datasets lib=' libname 'nolist;' ;
if first.memname then put ' modify ' memname ';' ;
constraint=tranwrd(constraint,'$varname$',trim(varname));
message=tranwrd(message,'$varname$',trim(varname));
put 'ic create ' constraint message= :$quote. ';' ;
if last.memname then put 'run;';
if last.libname then put 'quit;' ;
run;
%include code / source2 ;
So running the example we get a SAS log like this:
161 +proc datasets lib=work nolist;
162 + modify tableA ;
163 +ic create primary key(var1) message="var1 is a primary key" ;
NOTE: Integrity constraint _PK0001_ defined.
164 +run;
NOTE: MODIFY was successful for WORK.TABLEA.DATA.
165 + modify tableB ;
166 +ic create check(where=(var2 in ('a', 'b'))) message="invalid var2 value" ;
NOTE: Integrity constraint _CK0001_ defined.
167 +run;
NOTE: MODIFY was successful for WORK.TABLEB.DATA.
168 +quit;