I have a work progress sheet to make. So if we have a table with work progress as new, progress, start, end and restart and some Rules were:
First, when work is NEW, start date set as '1/01/2013' and other following work_progress set same.
Second, if work is END-ed and again ADD-ed,start date is set as '01/01/2016' (Below: Work_id=3). The following work_progress must have same value.
Last case, when work (work_id:1,2) RESTARTs, start date is set beginning of work received. The later dates must follow same '01/05/2017'. Below is the dataset outputted with my logic.
text -indent
+---------+---------------+-------------------+------------+------------+
| work_id | work_progress | received_date | start | end |
+---------+---------------+-------------------+------------+------------+
| 1 | NEW | November 19, 2016 | 01/01/2013 | 31/12/2020 |
| 1 | PROGRESS | December 25, 2016 | 01/01/2013 | 31/12/2020 |
| 1 | END | January 1, 2017 | 01/01/2013 | 02/02/2017 |
| 1 | RESTART | February 5, 2017 | 01/05/2017 | 31/12/2020 |
| 1 | PROGRESS | March 20, 2017 | 01/01/2013 | 31/12/2020 |
| 2 | NEW | November 19, 2016 | 01/01/2013 | 31/12/2020 |
| 2 | PROGRESS | December 25, 2016 | 01/01/2013 | 31/12/2020 |
| 2 | END | January 1, 2017 | 01/01/2013 | 31/12/2020 |
| 2 | RESTART | February 5, 2017 | 01/05/2017 | 31/12/2020 |
| 2 | PROGRESS | March 20, 2017 | 01/01/2013 | 31/12/2020 |
| 3 | NEW | November 19, 2016 | 01/01/2013 | 31/12/2020 |
| 3 | END | December 25, 2016 | 01/01/2013 | 02/02/2017 |
| 3 | NEW | January 1, 2017 | 01/01/2016 | 31/12/2020 |
| 3 | END | February 5, 2017 | 01/01/2013 | 02/02/2017 |
| 3 | END | March 20, 2017 | 01/01/2013 | 03/03/2017 |
| 3 | END | April 21, 2017 | 01/01/2013 | 04/04/2017 |
+---------+---------------+-------------------+------------+------------+
What i actually what my output to be:
+---------+---------------+-------------------+------------+------------+
| work_id | work_progress | received_date | start | end |
+---------+---------------+-------------------+------------+------------+
| 1 | NEW | November 19, 2016 | 01/01/2013 | 31/12/2020 |
| 1 | PROGRESS | December 25, 2016 | 01/01/2013 | 31/12/2020 |
| 1 | END | January 1, 2017 | 01/01/2013 | 02/02/2017 |
| 1 | RESTART | February 5, 2017 | 01/05/2017 | 31/12/2020 |
| 1 | PROGRESS | March 20, 2017 | 01/05/2017 | 31/12/2020 |
| 2 | NEW | November 19, 2016 | 01/01/2013 | 31/12/2020 |
| 2 | PROGRESS | December 25, 2016 | 01/01/2013 | 31/12/2020 |
| 2 | END | January 1, 2017 | 01/01/2013 | 31/12/2020 |
| 2 | RESTART | February 5, 2017 | 01/05/2017 | 31/12/2020 |
| 2 | PROGRESS | March 20, 2017 | 01/05/2017 | 31/12/2020 |
| 3 | NEW | November 19, 2016 | 01/01/2013 | 31/12/2020 |
| 3 | END | December 25, 2016 | 01/01/2013 | 02/02/2017 |
| 3 | NEW | January 1, 2017 | 01/01/2016 | 31/12/2020 |
| 3 | END | February 5, 2017 | 01/01/2016 | 02/02/2017 |
| 3 | END | March 20, 2017 | 01/01/2016 | 02/02/2017 |
| 3 | END | April 21, 2017 | 01/01/2016 | 02/02/2017 |
+---------+---------------+-------------------+------------+------------+
Requirement:
I require to use lag here to retain the start and end dates. I already have implemented half of my problem's logic except this lag usage part. Part of sas code:
data m_out_ds;
set m_in_ds;
by work_id work_received_date;
/*--------
Some logic to derive my rules, that gave output, first table above.
----------*/
prevstart = lag(start);
prevend = lag(end);
prev_work_progress = lag(work_progress);
if work_progress = 'END' and prev_work_progress = 'END' then end = prevend;
/*---This gave 02/02/2017 for march received date only,
but we require for april too, obvious the work has ended.----*/
if work_progress = 'PROGRESS' and prev_work_progress ='RESTART'
then start = prevstart;
/*---This however worked---*/
run;
Let me know if you've trouble understanding this. Thanks.
This seems to match your data, but I still not sure I understand the rules. First let's make your text into data.
data have ;
infile cards dsd dlm='|' truncover ;
row+1;
length work_id 8 work_progress $8 received_date start end 8 ;
informat received_date anydtdte. start end ddmmyy.;
format received_date start end yymmdd10.;
input work_id -- end ;
CARDS;
1|NEW | November 19, 2016|01/01/2013|31/12/2020
1|PROGRESS| December 25, 2016|01/01/2013|31/12/2020
1|END | January 1, 2017 |01/01/2013|02/02/2017
1|RESTART | February 5, 2017 |01/05/2017|31/12/2020
1|PROGRESS| March 20, 2017 |01/01/2013|31/12/2020
2|NEW | November 19, 2016|01/01/2013|31/12/2020
2|PROGRESS| December 25, 2016|01/01/2013|31/12/2020
2|END | January 1, 2017 |01/01/2013|31/12/2020
2|RESTART | February 5, 2017 |01/05/2017|31/12/2020
2|PROGRESS| March 20, 2017 |01/01/2013|31/12/2020
3|NEW | November 19, 2016|01/01/2013|31/12/2020
3|END | December 25, 2016|01/01/2013|02/02/2017
3|NEW | January 1, 2017 |01/01/2016|31/12/2020
3|END | February 5, 2017 |01/01/2013|02/02/2017
3|END | March 20, 2017 |01/01/2013|03/03/2017
3|END | April 21, 2017 |01/01/2013|04/04/2017
;
data want ;
infile cards dsd dlm='|' truncover ;
row+1;
length work_id 8 work_progress $8 received_date start end 8 ;
informat received_date anydtdte. start end ddmmyy.;
format received_date start end yymmdd10.;
input work_id -- end ;
CARDS;
1|NEW |November 19, 2016|01/01/2013|31/12/2020
1|PROGRESS |December 25, 2016|01/01/2013|31/12/2020
1|END |January 1, 2017 |01/01/2013|02/02/2017
1|RESTART |February 5, 2017 |01/05/2017|31/12/2020
1|PROGRESS |March 20, 2017 |01/05/2017|31/12/2020
2|NEW |November 19, 2016|01/01/2013|31/12/2020
2|PROGRESS |December 25, 2016|01/01/2013|31/12/2020
2|END |January 1, 2017 |01/01/2013|31/12/2020
2|RESTART |February 5, 2017 |01/05/2017|31/12/2020
2|PROGRESS |March 20, 2017 |01/05/2017|31/12/2020
3|NEW |November 19, 2016|01/01/2013|31/12/2020
3|END |December 25, 2016|01/01/2013|02/02/2017
3|NEW |January 1, 2017 |01/01/2016|31/12/2020
3|END |February 5, 2017 |01/01/2016|02/02/2017
3|END |March 20, 2017 |01/01/2016|02/02/2017
3|END |April 21, 2017 |01/01/2016|02/02/2017
;
Now let's try to convert it.
data try ;
set have ;
by work_id;
retain new_start new_end ;
format new_start new_end yymmdd10.;
if first.work_id then call missing(of new_start new_end);
if work_progress in ('NEW','RESTART') then new_start=start ;
start=coalesce(new_start,start);
if work_progress in ('END') then do;
if missing(new_end) then new_end=end ;
end=coalesce(new_end,end);
end;
run;
proc compare data=want compare=try;
id row;
run;
proc print data=try; run;