Search code examples
sas

SAS - Keep only columns listed in a separate dataset


I have two datasets. The first, big_dataset, has around 3000 columns, most of which are never used. The second, column_list, contains a single column called column_name with around 100 values. Each value is the name of a column I want to keep.

I want to filter big_dataset so that only columns in column_list are kept, and the rest are discarded.

If I were using Pandas dataframes in Python, this would be a trivial task:

cols = column_list['column_name'].tolist()
smaller_dataset = big_dataset[cols]

However, I can't figure out the SAS equivalent. Proc Transpose doesn't let me turn the rows into headers. I can't figure out a statement in the data step that would let this work, and as far as I'm aware this isn't something that Proc SQL could handle. I've read through the docs on Proc Datasets and that doesn't seem to have what I need either.


Solution

  • To obtain a list of columns from column_list to use against big_dataset, you can query the column_list table and put the result into a macro variable. This can be achieved with PROC SQL and the SEPARATED BY clause:

    proc sql noprint;
      select column_name
      into :cols separated by ','
      from column_list;
      create table SMALLER_DATASET AS
      select &cols.
      from WORK.BIG_DATASET;
    quit;
    

    Alternatively you may use SEPARATED BY ' ' and then use the resulting list in a KEEP statement or dataset option:

    proc sql noprint;
      select column_name
      into :cols separated by ' '
      from column_list;
    quit;
    
    data small_dataset;
      set big_dataset (keep=&cols.);
      /* or keep=&cols.; */
    run;