Search code examples
saslagretain

SAS : Understanding lag function to retain dates based on work progess


I have a work progress sheet to make. So if we have a table with work progress as new, progress, start, end and restart and some Rules were:

  1. First, when work is NEW, start date set as '1/01/2013' and other following work_progress set same.

  2. Second, if work is END-ed and again ADD-ed,start date is set as '01/01/2016' (Below: Work_id=3). The following work_progress must have same value.

  3. Last case, when work (work_id:1,2) RESTARTs, start date is set beginning of work received. The later dates must follow same '01/05/2017'. Below is the dataset outputted with my logic.

text -indent

+---------+---------------+-------------------+------------+------------+
| work_id | work_progress |   received_date   |   start    |    end     |
+---------+---------------+-------------------+------------+------------+
|       1 | NEW           | November 19, 2016 | 01/01/2013 | 31/12/2020 |
|       1 | PROGRESS      | December 25, 2016 | 01/01/2013 | 31/12/2020 |
|       1 | END           | January 1, 2017   | 01/01/2013 | 02/02/2017 |
|       1 | RESTART       | February 5, 2017  | 01/05/2017 | 31/12/2020 |
|       1 | PROGRESS      | March 20, 2017    | 01/01/2013 | 31/12/2020 |
|       2 | NEW           | November 19, 2016 | 01/01/2013 | 31/12/2020 |
|       2 | PROGRESS      | December 25, 2016 | 01/01/2013 | 31/12/2020 |
|       2 | END           | January 1, 2017   | 01/01/2013 | 31/12/2020 |
|       2 | RESTART       | February 5, 2017  | 01/05/2017 | 31/12/2020 |
|       2 | PROGRESS      | March 20, 2017    | 01/01/2013 | 31/12/2020 |
|       3 | NEW           | November 19, 2016 | 01/01/2013 | 31/12/2020 |
|       3 | END           | December 25, 2016 | 01/01/2013 | 02/02/2017 |
|       3 | NEW           | January 1, 2017   | 01/01/2016 | 31/12/2020 |
|       3 | END           | February 5, 2017  | 01/01/2013 | 02/02/2017 |
|       3 | END           | March 20, 2017    | 01/01/2013 | 03/03/2017 |
|       3 | END           | April 21, 2017    | 01/01/2013 | 04/04/2017 |
+---------+---------------+-------------------+------------+------------+  

What i actually what my output to be:

+---------+---------------+-------------------+------------+------------+
| work_id | work_progress |   received_date   |   start    |    end     |
+---------+---------------+-------------------+------------+------------+
|       1 | NEW           | November 19, 2016 | 01/01/2013 | 31/12/2020 |
|       1 | PROGRESS      | December 25, 2016 | 01/01/2013 | 31/12/2020 |
|       1 | END           | January 1, 2017   | 01/01/2013 | 02/02/2017 |
|       1 | RESTART       | February 5, 2017  | 01/05/2017 | 31/12/2020 |
|       1 | PROGRESS      | March 20, 2017    | 01/05/2017 | 31/12/2020 |
|       2 | NEW           | November 19, 2016 | 01/01/2013 | 31/12/2020 |
|       2 | PROGRESS      | December 25, 2016 | 01/01/2013 | 31/12/2020 |
|       2 | END           | January 1, 2017   | 01/01/2013 | 31/12/2020 |
|       2 | RESTART       | February 5, 2017  | 01/05/2017 | 31/12/2020 |
|       2 | PROGRESS      | March 20, 2017    | 01/05/2017 | 31/12/2020 |
|       3 | NEW           | November 19, 2016 | 01/01/2013 | 31/12/2020 |
|       3 | END           | December 25, 2016 | 01/01/2013 | 02/02/2017 |
|       3 | NEW           | January 1, 2017   | 01/01/2016 | 31/12/2020 |
|       3 | END           | February 5, 2017  | 01/01/2016 | 02/02/2017 |
|       3 | END           | March 20, 2017    | 01/01/2016 | 02/02/2017 |
|       3 | END           | April 21, 2017    | 01/01/2016 | 02/02/2017 |
+---------+---------------+-------------------+------------+------------+    

Requirement:

  1. Start date should be added to later work progress when NEW and RESTART.
  2. In end date in work_id=3 and work_progress= END. The march and april both should have end date that of feb.

I require to use lag here to retain the start and end dates. I already have implemented half of my problem's logic except this lag usage part. Part of sas code:

data m_out_ds;
 set m_in_ds;
 by work_id work_received_date;
 /*--------
 Some logic to derive my rules, that gave output, first table above.
  ----------*/
 prevstart = lag(start);
 prevend = lag(end); 
 prev_work_progress = lag(work_progress);

 if work_progress = 'END' and prev_work_progress = 'END' then end = prevend;

/*---This gave 02/02/2017 for march received date only, 
  but we require for april too, obvious the work has ended.----*/

if work_progress = 'PROGRESS' and prev_work_progress ='RESTART' 
  then start = prevstart; 

/*---This however worked---*/

run;

Let me know if you've trouble understanding this. Thanks.


Solution

  • This seems to match your data, but I still not sure I understand the rules. First let's make your text into data.

    data have ;
      infile cards dsd dlm='|' truncover ;
      row+1;
      length work_id 8 work_progress $8 received_date start end 8 ;
      informat received_date anydtdte. start end ddmmyy.;
      format received_date  start end yymmdd10.;
      input work_id -- end ;
    CARDS;
       1|NEW     | November 19, 2016|01/01/2013|31/12/2020 
       1|PROGRESS| December 25, 2016|01/01/2013|31/12/2020 
       1|END     | January 1, 2017  |01/01/2013|02/02/2017 
       1|RESTART | February 5, 2017 |01/05/2017|31/12/2020 
       1|PROGRESS| March 20, 2017   |01/01/2013|31/12/2020 
       2|NEW     | November 19, 2016|01/01/2013|31/12/2020 
       2|PROGRESS| December 25, 2016|01/01/2013|31/12/2020 
       2|END     | January 1, 2017  |01/01/2013|31/12/2020 
       2|RESTART | February 5, 2017 |01/05/2017|31/12/2020 
       2|PROGRESS| March 20, 2017   |01/01/2013|31/12/2020 
       3|NEW     | November 19, 2016|01/01/2013|31/12/2020 
       3|END     | December 25, 2016|01/01/2013|02/02/2017 
       3|NEW     | January 1, 2017  |01/01/2016|31/12/2020 
       3|END     | February 5, 2017 |01/01/2013|02/02/2017 
       3|END     | March 20, 2017   |01/01/2013|03/03/2017 
       3|END     | April 21, 2017   |01/01/2013|04/04/2017 
    ;
    data want ;
      infile cards dsd dlm='|' truncover ;
      row+1;
      length work_id 8 work_progress $8 received_date start end 8 ;
      informat received_date anydtdte. start end ddmmyy.;
      format received_date  start end yymmdd10.;
      input work_id -- end ;
    CARDS;
       1|NEW        |November 19, 2016|01/01/2013|31/12/2020 
       1|PROGRESS   |December 25, 2016|01/01/2013|31/12/2020 
       1|END        |January 1, 2017  |01/01/2013|02/02/2017 
       1|RESTART    |February 5, 2017 |01/05/2017|31/12/2020 
       1|PROGRESS   |March 20, 2017   |01/05/2017|31/12/2020 
       2|NEW        |November 19, 2016|01/01/2013|31/12/2020 
       2|PROGRESS   |December 25, 2016|01/01/2013|31/12/2020 
       2|END        |January 1, 2017  |01/01/2013|31/12/2020 
       2|RESTART    |February 5, 2017 |01/05/2017|31/12/2020 
       2|PROGRESS   |March 20, 2017   |01/05/2017|31/12/2020 
       3|NEW        |November 19, 2016|01/01/2013|31/12/2020 
       3|END        |December 25, 2016|01/01/2013|02/02/2017 
       3|NEW        |January 1, 2017  |01/01/2016|31/12/2020 
       3|END        |February 5, 2017 |01/01/2016|02/02/2017 
       3|END        |March 20, 2017   |01/01/2016|02/02/2017 
       3|END        |April 21, 2017   |01/01/2016|02/02/2017 
    ;
    

    Now let's try to convert it.

    data try ;
      set have ;
      by work_id;
      retain new_start new_end ;
      format new_start new_end yymmdd10.;
      if first.work_id then call missing(of new_start new_end);
      if work_progress in ('NEW','RESTART') then new_start=start ;
      start=coalesce(new_start,start);
      if work_progress in ('END') then do;
        if missing(new_end) then new_end=end ;
        end=coalesce(new_end,end);
      end;
    run;
    proc compare data=want compare=try;
      id row;
    run;
    proc print data=try; run;
    

    enter image description here