Search code examples
saswrds-compusatwrds

SAS NOTSORTED Equivalent


I was using the following code to analyze data:

 set taq.cq_&yyyymmdd:;
 by symbol date time NOTSORTED ex;

There are are thousands of datasets I am running the code on in the unit of days. When &yyyymmdd only specifies one dataset (for one day. for example, 20130102), it works. However, when I try to run it for multiple datasets (for example, 201301:), SAS returns the following errors:

BY NOTSORTED/NOBYSORTED cannot be used with SET statement when
more than one data set is specified. 

If I cannot use NOTSORTED here, what is an equivalent statement that I could use?

My understanding of the keyword NOTSORTED is that you use it when the data is not sorted yet. Therefore, do I need to sort it first? How to do it?

I am also confused by the number of variables that NOTSORTED is referencing. Does it only have an effect on "time", or it has effect on "symbol, data, time"?

Many thanks!

UPDATE#2:

The rest of the process immediately following the set statement is: (pseudo code as i don't have the permission to post the original code)

Data _quotes;

SET STATEMENT HERE 

Change the name of a variable in the dataset (Variable name is EXN). 

last.EXN in a if statement. If the condition is satisfied, label EXN. 

Drop some variables. 

Run; 

DATA NEWDATASET (sortedby= SYMBOL DATE TIME index=(SYMBOL)
              label="WRDS-TAQ NBBO Data");

SET _quotes;
by symbol date time;

.... 

Run;

Solution

  • NOTSORTED means that SAS can assume the sort order in the data is correct, so it may not have explicitly gone through a PROC SORT but it is in logical order as listed in the BY statement.

    All variables in the BY statement are included in the NOTSORTED option. Given that I suspect you fully don't understand BY group processing. It's usually a bit dangerous to use, especially if you don't understand BY group processing. If your data is in the same group but not adjacent it won't work properly and will not produce an error. The correct workaround depends on your processes to be honest.

    I would suggest reviewing the documentation regarding BY group processing. It's quite in depth and has lots of samples to illustrate the different type of calculations.

    http://support.sas.com/documentation/cdl/en/lrcon/69852/HTML/default/viewer.htm#n138da4gme3zb7n1nifpfhqv7clq.htm

    NOTSORTED is often used in example posts to either avoid a sort or when using a custom sort that's difficult to implement in other ways. Explicitly sorting will remove this issue but you may also be misunderstanding how SAS processes data when you have a SET statement with a BY statement. I believe this is called interleaving.

    http://support.sas.com/documentation/cdl/en/lrcon/69852/HTML/default/viewer.htm#n1tgk0uanvisvon1r26lc036k0w7.htm