Search code examples
dataframegroup-bysas

How to calculate duration based on ID groups between two specific periods that may have missing dates?


Let's say I have this example data set


data df;
  infile datalines delimiter=','; 
  attrib
  id length = $8
  cycle length = $8
  date length = 8 format = mmddyy10. informat = mmddyy10.
  ;
input ID cycle date;
datalines;
1, 1, 10/01/2021
1, 2, 10/02/2021
1, 3, 11/03/2021
2, 1, 10/01/2021
2, 1, 10/01/2021
2, 2, 11/04/2021
2, 2, 11/04/2021
2, 3, 10/07/2021
3, 1, 10/02/2021
3, 2, 10/03/2021
3, 3,           
;
run;

I wish to calculate the duration between cycle 3 and cycle 1 for each of the ID with the desired result is the output of the following code (we can't use max(date) - min(date) because there are some missing dates in cycles 3 that's why I am asking the question).


data df_duration;
  infile datalines delimiter=','; 
  attrib
  id length = $8
  cycle length = $8
  date length = 8 format = mmddyy10. informat = mmddyy10.
  duration length = $8 
  ;
input ID cycle date duration;
datalines;
1, 1, 10/01/2021, 2
1, 2, 10/02/2021, 2
1, 3, 11/03/2021, 2
2, 1, 10/01/2021, 6
2, 1, 10/01/2021, 6
2, 2, 11/04/2021, 6
2, 2, 11/04/2021, 6
2, 3, 10/07/2021, 6
3, 1, 10/02/2021,  
3, 2, 10/03/2021, 
3, 3,           ,  
;
run;


Solution

  • Try this

    data df;
      infile datalines delimiter=','; 
      attrib
      id length = $8
      cycle length = $8
      date length = 8 format = mmddyy10. informat = mmddyy10.
      ;
    input ID cycle date;
    datalines;
    1, 1, 10/01/2021
    1, 2, 10/02/2021
    1, 3, 10/03/2021
    2, 1, 10/01/2021
    2, 1, 10/01/2021
    2, 2, 10/04/2021
    2, 2, 10/04/2021
    2, 3, 10/07/2021
    3, 1, 10/02/2021
    3, 2, 10/03/2021
    3, 3,           
    ;
    run;
    
    
    data want(drop = d1 d3);
    
       do _N_ = 1 by 1 until (last.id);
          set df;
          by id;
          if cycle = 1 then d1 = date;
          if cycle = 3 then d3 = date;
       end;
    
       do _N_ = 1 to _N_;
          set df;
          duration = d3 - d1;
          output;
       end;
    
    run;
    

    Result:

    id cycle date       duration
    1  1     10/01/2021 2
    1  2     10/02/2021 2
    1  3     10/03/2021 2
    2  1     10/01/2021 6
    2  1     10/01/2021 6
    2  2     10/04/2021 6
    2  2     10/04/2021 6
    2  3     10/07/2021 6
    3  1     10/02/2021 .
    3  2     10/03/2021 .
    3  3     .          .