I have two datasets. The first, big_dataset
, has around 3000 columns, most of which are never used. The second, column_list
, contains a single column called column_name
with around 100 values. Each value is the name of a column I want to keep.
I want to filter big_dataset
so that only columns in column_list
are kept, and the rest are discarded.
If I were using Pandas dataframes in Python, this would be a trivial task:
cols = column_list['column_name'].tolist()
smaller_dataset = big_dataset[cols]
However, I can't figure out the SAS equivalent. Proc Transpose
doesn't let me turn the rows into headers. I can't figure out a statement in the data step that would let this work, and as far as I'm aware this isn't something that Proc SQL
could handle. I've read through the docs on Proc Datasets
and that doesn't seem to have what I need either.
To obtain a list of columns from column_list
to use against big_dataset
, you can query the column_list
table and put the result into a macro variable. This can be achieved with PROC SQL
and the SEPARATED BY
clause:
proc sql noprint;
select column_name
into :cols separated by ','
from column_list;
create table SMALLER_DATASET AS
select &cols.
from WORK.BIG_DATASET;
quit;
Alternatively you may use SEPARATED BY ' '
and then use the resulting list in a KEEP
statement or dataset option:
proc sql noprint;
select column_name
into :cols separated by ' '
from column_list;
quit;
data small_dataset;
set big_dataset (keep=&cols.);
/* or keep=&cols.; */
run;