Search code examples
sas

Merge and update one file based on another one


suppose to have two data sets (files). File 1 is composed by time-periods with a label for each one and File2 that contains sub-periods without labels. I need to add labels to File2 based on the time interval from File1 so that if the period has Label "x" and the sub-period is contained in the period of File1, the sub-period will take the label from the period of File1.

Can anyone help me please?

    data have1;
    input ID :$20. Start :date9. End :date9. Label :$20. Role :$20.;
    format start end yymmdd10.;
    cards;
    0001 01JAN2015 30APR2015 HospitalA   ex005
    0001 01MAY2015 31MAY2015 HospitalA   ex004
    0001 01JUN2015 31DEC2015 HospitalC   ex005
    0002 06FEB2018 08FEB2018 HospitalA   ex004
    0002 09FEB2018 31AUG2018 HospitalC   ex005
    0002 01SEP2018 31DEC2019 HospitalC   ex004
    0003 01JAN2019 30SEP2019 HospitalD   ex008
    0003 01OCT2019 31DEC2020 HospitalD   ex004
   ;
   File2:
   data have2;
   input ID :$20. Start :date9. End :date9.;
   format start end yymmdd10.;
   cards;
   0001 01JAN2015 30JAN2015 
   0001 31JAN2015 15FEB2015 
   0001 15FEB2015 30APR2015
   0001 01MAY2015 15MAY2015
   0001 16MAY2015 31MAY2015
   0001 01JUN2015 15SEP2015
   0001 16SEP2015 31DEC2015
   ......
   ;
   File3 desired output:
   data output;
   input ID :$20. Start :date9. End :date9. Label :$20. Role :$20.;
   format start end yymmdd10.;
   cards;
   0001 01JAN2015 30JAN2015 HospitalA  ex005
   0001 31JAN2015 15FEB2015 HospitalA  ex005
   0001 15FEB2015 30APR2015 HospitalA  ex005 
   0001 01MAY2015 15MAY2015 HospitalA  ex004
   0001 16MAY2015 31MAY2015 HospitalA  ex004
   0001 01JUN2015 15SEP2015 HospitalC  ex005
   0001 16SEP2015 31DEC2015 HospitalC  ex005
   ......
   ;

Solution

  • Try this

    data have1;
      input ID :$20. Start :date9. End :date9. Label :$20. Role :$20.;
      format start end yymmdd10.;
    cards;
    0001 01JAN2015 30APR2015 HospitalA ex005
    0001 01MAY2015 31MAY2015 HospitalA ex004
    0001 01JUN2015 31DEC2015 HospitalC ex005
    0002 06FEB2018 08FEB2018 HospitalA ex004
    0002 09FEB2018 31AUG2018 HospitalC ex005
    0002 01SEP2018 31DEC2019 HospitalC ex004
    0003 01JAN2019 30SEP2019 HospitalD ex008
    0003 01OCT2019 31DEC2020 HospitalD ex004
    ;
    
    data have2;
      input ID :$20. Start :date9. End :date9.;
      format start end yymmdd10.;
    cards;
    0001 01JAN2015 30JAN2015 
    0001 31JAN2015 15FEB2015 
    0001 15FEB2015 30APR2015 
    0001 01MAY2015 15MAY2015 
    0001 16MAY2015 31MAY2015 
    0001 01JUN2015 15SEP2015 
    0001 16SEP2015 31DEC2015 
    ;
    
    data want(drop = s e);
    
       if _N_ = 1 then do;
          dcl hash h(dataset : 'have1(rename = (Start = s End = e)', multidata : 'Y');
          h.definekey('ID');
          h.definedata('s', 'e', 'Label', 'Role');
          h.definedone();
          dcl hiter i('h');
       end;
    
       set have2;
    
       if 0 then set have1(rename = (Start = s End = e));
       call missing(s, e, Label, Role);
    
       do while (i.next() = 0);
          if Start >= s and End <= e then leave;
          else call missing(Label, Role);
       end;
    
    run;
    

    Result:

    ID    Start       End         Label      Role
    0001  2015-01-01  2015-01-30  HospitalA  ex005
    0001  2015-01-31  2015-02-15             
    0001  2015-02-15  2015-04-30  HospitalA  ex005
    0001  2015-05-01  2015-05-15  HospitalA  ex004
    0001  2015-05-16  2015-05-31             
    0001  2015-06-01  2015-09-15  HospitalC  ex005
    0001  2015-09-16  2015-12-31