suppose to have the following:
data have; input ID :$20. Start :date9. End :date9.; format start end ddmmyy9.; cards; 0001 01JAN2015 30JUN2015 0001 01JUL2015 01FEB2016 0001 02FEB2016 11DEC2016 0001 12DEC2016 06FEB2017 0001 07FEB2017 31DEC2017 0002 01JAN2016 31DEC2017 0002 01JAN2018 01MAR2018 0002 01APR2018 31NOV2018 ...................... ;
and a list of dates:
data dates;
input dates :$20.;
format dates ddmmyy9.;
cards;
01JAN2015
31DEC2015
01JAN2016
31DEC2016
01JAN2017
31DEC2017
01JAN2018
31DEC2018
;
Is there a way to know if, for each ID, each date is in the range? For example: the ID 0001 contains all dates except 01JAN2018 and 31DEC2018. Moreover, for each year I need to count how many IDs start at 01/01 and end at 31/12 so they appear for the entire year. For example, ID 0002 will not be counted for 2018 because it ends before 31/12. Desired output:
ID 01JAN2015 31DEC2015 01JAN2016 31DEC2016 01JAN2017 31DEC2017 01JAN2018 31DEC2018 0001 yes yes yes yes yes yes no no 0002 no no yes yes yes yes yes no
Final table:
Year Count 2015 1 2016 2 2017 2 2018 0
To match the dates in the range I tried:
proc sql; create table want as; select dates as t1; join have as t2; t2.dates between t1.start and t1.end order by 1,2; quit;
Unfortunately I lose the ID correspondence.
Can anyone help me please?
Thank you in advance
The first output can be achieved using proc sql
followed by proc transpose
and then a data
step.
(The variables are created in a different order than in your desired output - hopefully that is not a problem for you.)
The second can be done with a data
step followed by proc summary
.
NB - I changed the invalid date 31NOV2018 to 30NOV2018 before running my code.
* First, join the HAVE and DATES data sets to check which dates fall within a range;
proc sql;
create table want1 as
select h.ID, d.dates format=date9., 'yes' as flag
from dates d left join have h on (d.dates between h.start and h.end)
order by ID, dates
;
run;
* transpose this to the wider format required. Cells where there is no match will be blank;
proc transpose data=want1 out=want1 (where=(ID ne '') drop=_name_);
by ID;
id dates;
var flag;
run;
* Now populate the blank cells with "no";
data want1;
set want1;
array datevars (*) _all_;
do i = 1 to dim(datevars);
if datevars(i) = '' then datevars(i)='no';
end;
drop i;
run;
* This assumes the data are in order by ID, start and end - if not then sort the data before this step;
* First read throught the HAVE data set and set count = 1 for each year that is fully covered by each ID;
data want2;
set have;
by ID;
retain first last count; * FIRST and LAST are the earliest START and latest END respectively for each ID;
if first.ID then do;
first=start;
count=0;
end;
if last.ID then do;
last=end;
do yr=year(first) to year(last); * for each ID we want to cycle through all the years covered;
if first <= mdy(1,1,yr) and last >= mdy(12,31,yr) then count = 1; * sets count=1 if the year is fully contained within the range;
output;
count=0;
end;
end;
drop start end first last;
format first last date9.;
run;
* The step above produces a row for each combination of year * ID, we just want totals by year which is done by PROC SUMMARY;
proc summary data=want2 nway;
class yr;
var count;
output out=want2 (drop=_type_ _freq_) sum=;
run;