Search code examples
sasretain

SAS: repeat the last value by ID


I have this database:

data temp;
input ID date type ;
  datalines;
 1 10/11/2006   1      
 1 10/12/2006   2      
 1 15/01/2007   2      
 1 20/01/2007   3    
 2 10/08/2008   1        
 2 11/09/2008   1        
 2 17/10/2008   1        
 2 12/11/2008   2    
 2 10/12/2008   3       
 ;

I would like to create a new column where I repeat the last date by ID:

data temp;
input ID date type  last_date;
  datalines;
 1 10/11/2006   1        20/01/2007
 1 10/12/2006   2        20/01/2007
 1 15/01/2007   2        20/01/2007
 1 20/01/2007   3        20/01/2007
 2 10/08/2008   1        10/12/2008
 2 11/09/2008   1        10/12/2008
 2 17/10/2008   1        10/12/2008
 2 12/11/2008   2        10/12/2008
 2 10/12/2008   3        10/12/2008
 ;

I have tried this code but it doesn't work:

  data temp;
  set temp;
  IF last.ID then last_date= .;
  RETAIN last_date;
  if   missing(last_date) then last_date= date;
  run;

Thank you in advance for your help!


Solution

  • First thing is that FIRST.ID and LAST.ID variables are not created in the data step unless you include the variable ID in the BY statement.

    Second is that to attach the last date to each observation you need to process the data twice. Your current code (if the BY statement is added) will only assign a value to LAST_DATE on the last observation of the by group.

    One way to do this is to re-sort the data by descending date within each by group then you could use BY ID and FIRST.ID and RETAIN.

    proc sort data=have;
       by id descending date;
    run;
    data want;
       set have;
       by id descending date;
       if first.id then last_date=date;
       retain last_date;
       format last_date ddmmyy10.;
    run;
    

    Here is a way to use the original sort order using what is called a double DOW loop. By placing the SET/BY statements inside of a DO loop you can read all of the observations for a group in a single pass of the data step. You then add a second DO loop to re-process that BY group and use the information calculated in the first loop and write out the observations.

    data want;
    do until (last.id);
      set have;
      by id;
    end;
    last_date=date ;
    format last_date ddmmyy10.;
    do until (last.id);
      set have;
      by id; 
      output;
    end;
    run;