Search code examples
sasdatastep

How can I write a DATA step that will drop all variables from the input dataset except the ones that I explicitly define within the dataset?


I want to generate a new SAS dataset using table foo as the input and one-to-one correspondence with records in the output dataset bar. I wand to drop variables from foo by default but I also require all of the fields of foo be available (to derive new variables) and also that some variables from foo to be kept (if explicitly indicated).

I'm currently managing an explicit list of variables to drop= but it results in long and unwieldy syntax in the data-set-option declaration.*

DATA bar (drop=id data_value2);
set foo;

new_id                 = id;
data_value1            = data_value1;     /* Explicitly included for clarity */
new_derived_data_value = data_value2 * 2; /* etc. */

format new_id                  $fmt_id.
       data_value1             $fmt_dat.
       new_derived_data_value  $fmt_ddat.
;
RUN;

The output table I want should have only fields data_value1, new_data and new_derived_data_value.

I'm looking for the most syntactically succinct way of reproducing the same effect as :

SELECT 
  id AS new_id
  ,data_value1
  ,data_value2 * 2 AS new_derived_data_value 
FROM foo

How can I write a DATA step that will drop all variables from the input dataset except the ones that I explicitly define within the dataset?

* Update: I could use aaa--hhh type notatation but even this can be unwieldy if the ordering of the variables changes over time or I later decide I'd like to keep variable ddd.


Solution

  • I would store the variable names in a macro list, obtained from the DICTIONARY tables. You can then drop them all easily in a data step. e.g.

    proc sql noprint;
    select name into :vars separated by ' '
    from dictionary.columns
    where libname = 'SASHELP' and memname='CLASS';
    quit;
    
    data want (drop=&vars.);
    set sashelp.class;
    name1=name;
    age1=age;
    run;