Search code examples
sasdatastep

Group By Interpolation Based on the Previous Row


The goal is to add a new row whenever there is a gap between the date variable between two rows grouped by id. If the gap occurs, then duplicate a row that is first. However only the date feature should not be as the first row rather it should be incremented by one day.

Also, everything needs to be grouped by id. I need to achieve it without expanding the function.

data sample;
    input id date numeric_feature character_feature $;
    informat date yymmdd10.;
    datalines;

1 2020-01-01 5 A
1 2020-01-02 3 Z
1 2020-01-04 2 D
1 2020-01-05 7 B
2 2020-01-01 4 V
2 2020-01-03 1 B
2 2020-01-05 9 F
;

data sample;
    set sample;
    format date yymmdd10.;
run;

The desired result:

data sample;
    input id date numeric_feature character_feature $;
    informat date yymmdd10.;
    datalines;

1 2020-01-01 5 A
1 2020-01-02 3 Z
1 2020-01-03 3 Z
1 2020-01-04 2 D
1 2020-01-05 7 B
2 2020-01-01 4 V
2 2020-01-02 4 V
2 2020-01-03 1 B
2 2020-01-04 1 B
2 2020-01-05 9 F
;

data sample;
    set sample;
    format date yymmdd10.;
run;

Solution

  • You can perform a 1:1 self merge with the second self starting at row 2 in order to provide a lead value. A 1:1 merge does not use a BY statement.

    Example:

    data have;
        input id date numeric_feature character_feature $;
        informat date yymmdd10.;
          format date yymmdd10.;
        datalines;
    1 2020-01-01 5 A
    1 2020-01-02 3 Z
    1 2020-01-04 2 D
    1 2020-01-05 7 B
    2 2020-01-01 4 V
    2 2020-01-03 1 B
    2 2020-01-05 9 F
    ;
    
    data want;
      * 1:1 merge without by statement;
      merge 
        have              /* start at row 1 */    
        have ( firstobs=2 /* start at row 2 for lead values */
               keep=id date            /* more data set options that prepare the lead */
               rename = ( id=nextid
                          date=nextdate
             )) 
      ;
    
      output;
    
      flag = '*';                  /* marker for filled in dates */
      if id = nextid then
        do date=date+1 to nextdate-1;
          output;
        end;
    
      drop next:;
    run;
    

    Result flagging filled in dates

    enter image description here