Let's say I have this example data set
data df;
infile datalines delimiter=',';
attrib
id length = $8
cycle length = $8
date length = 8 format = mmddyy10. informat = mmddyy10.
;
input ID cycle date;
datalines;
1, 1, 10/01/2021
1, 2, 10/02/2021
1, 3, 11/03/2021
2, 1, 10/01/2021
2, 1, 10/01/2021
2, 2, 11/04/2021
2, 2, 11/04/2021
2, 3, 10/07/2021
3, 1, 10/02/2021
3, 2, 10/03/2021
3, 3,
;
run;
I wish to calculate the duration between cycle 3 and cycle 1 for each of the ID with the desired result is the output of the following code (we can't use max(date) - min(date)
because there are some missing dates in cycles 3 that's why I am asking the question).
data df_duration;
infile datalines delimiter=',';
attrib
id length = $8
cycle length = $8
date length = 8 format = mmddyy10. informat = mmddyy10.
duration length = $8
;
input ID cycle date duration;
datalines;
1, 1, 10/01/2021, 2
1, 2, 10/02/2021, 2
1, 3, 11/03/2021, 2
2, 1, 10/01/2021, 6
2, 1, 10/01/2021, 6
2, 2, 11/04/2021, 6
2, 2, 11/04/2021, 6
2, 3, 10/07/2021, 6
3, 1, 10/02/2021,
3, 2, 10/03/2021,
3, 3, ,
;
run;
Try this
data df;
infile datalines delimiter=',';
attrib
id length = $8
cycle length = $8
date length = 8 format = mmddyy10. informat = mmddyy10.
;
input ID cycle date;
datalines;
1, 1, 10/01/2021
1, 2, 10/02/2021
1, 3, 10/03/2021
2, 1, 10/01/2021
2, 1, 10/01/2021
2, 2, 10/04/2021
2, 2, 10/04/2021
2, 3, 10/07/2021
3, 1, 10/02/2021
3, 2, 10/03/2021
3, 3,
;
run;
data want(drop = d1 d3);
do _N_ = 1 by 1 until (last.id);
set df;
by id;
if cycle = 1 then d1 = date;
if cycle = 3 then d3 = date;
end;
do _N_ = 1 to _N_;
set df;
duration = d3 - d1;
output;
end;
run;
Result:
id cycle date duration
1 1 10/01/2021 2
1 2 10/02/2021 2
1 3 10/03/2021 2
2 1 10/01/2021 6
2 1 10/01/2021 6
2 2 10/04/2021 6
2 2 10/04/2021 6
2 3 10/07/2021 6
3 1 10/02/2021 .
3 2 10/03/2021 .
3 3 . .