Search code examples
sasprocdatastep

difference between performance of proc copy and that of data step


Consider two scenarios of sales contains over 10M records, 50 bytes for one record:

SCENARIO ONE:

data STORE.sale;
set work.sale;
run;

proc copy in = work out = STORE;
select sale;
run;

SCENARIO TWO:

data STORE.sale;
set FACTORY.sale;
run;

proc copy in = FACTORY out = STORE;
select sale;
run;

In these two scenarios, is there a reason to choose one way over another?


Solution

  • Choose proc copy. Apart from being faster (no need to page data and feed through the PDV), it will also retain your indexes (and can be configured to retain integrity constraints as well with CONSTRAINT=YES option)

    If the source in your set statement is actually a VIEW though, you may wish to use the data step in order to preserve the target values for that point in time..

    To summarise:

    Proc copy will copy the dataset (think copy / paste). Values cannot be modified.

    data step will rebuild the dataset (one record at a time) Values can be modified at the same time.