I'm trying to write SAS code that can loop over a dataset that contains event dates that looks like:
Data event;
input Date;
cards;
20200428
20200429
;
run;
And calculate averages for the prior three-days from another dataset that contains dates and volume that looks like:
Data vol;
input Date Volume;
cards;
20200430 100
20200429 110
20200428 86
20200427 95
20200426 80
20200425 90
;
run;
For example, for date 20200428 the average should be 88.33 [(95+80+90)/3] and for date 20200429 the average should be 87.00 [(86+95+80)/3]. I want these values and the volume of the date to be saved on a new dataset that looks like the following if possible.
Data clean;
input Date Vol Avg;
cards;
20200428 86 88.33
20200429 110 87.00
;
run;
The actual data that I'm working with is from 1970-2010. I may also increase my average period from 3 days prior to 10 days prior, so I want to have flexible code. From what I've read I think a macro and/or call symput might work very well for this, but I'm not sure how to code these to do what I want. Honestly, I don't know where to start. Can anyone point me in the right direction? I'm open to any advice/ideas. Thanks.
A SQL statement is by far the most succinct code for obtaining your result set. The query will join with 2 independent references to volume data. The first for obtaining the event date volume, and the second for computing the average volume over the three prior days.
The date
data should be read in as a SAS date, so that the BETWEEN condition will be correct.
Data event;
input Date: yymmdd8.;
cards;
20200428
20200429
;
run;
Data vol;
input Date: yymmdd8. Volume;
cards;
20200430 100
20200429 110
20200428 86
20200427 95
20200426 80
20200425 90
;
run;
* SQL query with GROUP BY ;
proc sql;
create table want as
select
event.date
, volume_one.volume
, mean(volume_two.volume) as avg
from event
left join vol as volume_one
on event.date = volume_one.date
left join vol as volume_two
on volume_two.date between event.date-1 and event.date-3
group by
event.date, volume_one.volume
;
* alternative query using correlated sub-query;
create table want_2 as
select
event.date
, volume
, ( select mean(volume) as avg from vol where vol.date between event.date-1 and event.date-3 )
as avg
from event
left join vol
on event.date = vol.date
;