Search code examples
sasintervalscollate

SAS Collating/summarizing observations over a time interval


I have a set of trading data that has observations in milliseconds but infrequent. To make it a reasonable timeseries I created a macro to build a grid (one observation per second). Now I want my dataset to fit into the grid (summing up the volumes that happened in that second and using the last quote/trade price. I am very thankful for any ideas. See code below.

Best

%macro makeGrid;
    proc sort data=data.inputdataset; by id date milliseconds descending type order; run;    
    options nomprint;
    data data.outputgrid (keep=id date miliseconds type order); 
    set data.inputdataset; 
    by id date; 
        if first.date;
        Type="grid"; Order=0;
        %do i=((9*60*60)+(10*60)) %to ((16*60*60)+(50*60)); 
            milliseconds=&i; output; 
        %end;
    run;
    options mprint;
%mend makeGrid;

A very desperate try was this:

data data.merged;
merge data.outputgrid data.inputdataset;
by id date milliseconds;
if first.id then do;
if milliseconds collate = sum(volume)
run;

Grid:

ID date time price volume
ABC 01/01/15 0801000 
ABC 01/01/15 0802000 
ABC 01/01/15 0803000 
ABC 01/01/15 0804000

Data example:

ID date time price volume
ABC 01/01/15 0800004 1,55 100
ABC 01/01/15 0800110 1,56 200
ABC 01/01/15 0800152 1,52 300
ABC 01/01/15 0800210 1,51 400
ABC 01/01/15 0800352 1,50 200
ABC 01/01/15 0800384 1,51 400

Output:

ID date time price volume
ABC 01/01/15 0801000 1,55 100
ABC 01/01/15 0802000 1,52 500
ABC 01/01/15 0803000 1,51 400
ABC 01/01/15 0804000 1,51 600

If there is no trade in that second, volume should be 0 and price the last second's price.

EDIT:

so, thanks to Chris I made the Grid without Macro, but the proc SQL is not working as I want.

data grid1;
do seconds = '09:10't to '16:50't ;
    output ;
  end ;
/*  id=.;*/
/*  date=.;*/
/*  format ric $12. ;*/
/*  format Date best12. ;*/
run;

 proc sql ;
  create table want as
  select a.id, a.date, a.miliseconds, sum(a.volume) as Volume
  from have a
     left join
       grid1 b  on a.id = b.id
                    and a.date = b.date
                    and a.miliseconds = floor(b.seconds)
  group by a.id, a.date, a.miliseconds ;
quit;
run;

So, id and date couldn't be found, so I created them in the data step. But then id isn't the same format, which I tried to fix but failed. It seems I have an error when merging the have dataset to the want by matching the miliseconds between the seconds gridlines. What am I missing? Always thankful for input! Best


Solution

  • First of all, thank you all for the answers. I figured how to do it. There might be an easier way but for me, this now is sufficient:

    data data.test;
    set data.taq;
    next_full_sec = int((miliseconds-1) / 1000) + 1;
    run;
    
    data data.grid;
    do seconds = 1 to 86400;
      output;
    end;
    run;
    
    proc summary data=data.test;
    by ric date time next_full_sec;
    var volume;
    output out=data.int (drop=_type_) sum=;
    run;
    
    proc sort data=data.test (keep=ric date) out=data.id_date nodupkey;
    by ric date time;
    run;
    
    proc sql;
    create table data.testseconds as
    select ric, date, time, seconds
    from data.id_date, data.grid
    order by ric, date, time, seconds
    ;
    quit;
    
    data data.testout;
    merge
      data.testseconds (in=a rename=(fullseconds=next_full_sec))
      data.int (in=b)
    ;
    by ric date time next_full_sec;
    if a;
    if not b
    then do;
      _freq_ = 0;
      volume = 0;
    end;
    run;
    
    proc print noobs;
    run;