Search code examples
sassas-macro

How to average a subset of data in SAS conditional on a date?


I'm trying to write SAS code that can loop over a dataset that contains event dates that looks like:

Data event;
     input Date;
     cards;
     20200428
     20200429
     ;
run;

And calculate averages for the prior three-days from another dataset that contains dates and volume that looks like:

Data vol;
     input Date Volume;
     cards;
     20200430  100
     20200429  110
     20200428  86
     20200427  95
     20200426  80
     20200425  90
     ;
run;

For example, for date 20200428 the average should be 88.33 [(95+80+90)/3] and for date 20200429 the average should be 87.00 [(86+95+80)/3]. I want these values and the volume of the date to be saved on a new dataset that looks like the following if possible.

Data clean;
     input Date Vol Avg;
     cards;
     20200428 86 88.33
     20200429 110 87.00
     ;
run;

The actual data that I'm working with is from 1970-2010. I may also increase my average period from 3 days prior to 10 days prior, so I want to have flexible code. From what I've read I think a macro and/or call symput might work very well for this, but I'm not sure how to code these to do what I want. Honestly, I don't know where to start. Can anyone point me in the right direction? I'm open to any advice/ideas. Thanks.


Solution

  • A SQL statement is by far the most succinct code for obtaining your result set. The query will join with 2 independent references to volume data. The first for obtaining the event date volume, and the second for computing the average volume over the three prior days.

    The date data should be read in as a SAS date, so that the BETWEEN condition will be correct.

    Data event;
         input Date: yymmdd8.;
         cards;
         20200428
         20200429
         ;
    run;
    
    Data vol;
         input Date: yymmdd8. Volume;
         cards;
         20200430  100
         20200429  110
         20200428  86
         20200427  95
         20200426  80
         20200425  90
         ;
    run;
    

    * SQL query with GROUP BY ;

    proc sql;
      create table want as
      select 
        event.date
      , volume_one.volume
      , mean(volume_two.volume) as avg
      from event
      left join vol as volume_one
      on event.date = volume_one.date
      left join vol as volume_two
      on volume_two.date between event.date-1 and event.date-3
      group by 
      event.date, volume_one.volume
      ;
    

    * alternative query using correlated sub-query;

      create table want_2 as
      select 
        event.date
      , volume
      , ( select mean(volume) as avg from vol where vol.date between event.date-1 and event.date-3 )
        as avg
      from event
      left join vol
      on event.date = vol.date
      ;