This is an extension of an earlier question. (Drop observations once condition is met by multiple variables).
I have the following data and used one of the existing answered questions to solve my data problem but could not get what I want. Here is what I have in my data
Have:
id Date Evt_Type Flag Amt1 Amt2
101 2/2/2019 Fee 5
101 2/3/2019 REF1 Y 5
101 2/4/2019 Fee 10
101 2/6/2019 REF2 Y 10
101 2/7/2019 Fee 4
101 2/8/2019 REF1
102 2/2/2019 Fee 25
102 2/2/2019 REF1 N 25
103 2/3/2019 Fee 10
103 2/4/2019 REF1 Y 10
103 2/5/2019 Fee 10
Want:
id Date Evt_Type Flag Amt1 Amt2
101 2/2/2019 Fee 5
101 2/3/2019 REF1 Y 5
101 2/4/2019 Fee 10
101 2/6/2019 REF2 Y 10
101 2/7/2019 Fee 4
101 2/8/2019 REF1
102 2/2/2019 Fee 25
102 2/2/2019 REF1 N 25
103 2/3/2019 Fee 10
103 2/4/2019 REF1 Y 10
I tried the following
data want;
_max_n_with_Y = 1e12;
do _n_ = 1 by 1 until (last.id);
set have;
by id;
if flag='Y' then _max_n_with_Y = _n_;
end;
do _n_ = 1 to _n_;
set have;
if _n_ <= _max_n_with_Y then OUTPUT;
end;
drop _:;
run;
Any help is appreciated.
Thanks
The important 'landmark' is the row with flag='Y'
The extra criteria for outputting rows post-landmark complicate the state machine being coded to track (or compute) the row number (_n_
) for last output of the group.
The row='Y'
state is easily known. Unconditional use of LAG
can be used to examine the post-Y state. SAS IF
statements do not have short circuit evaluation, so as long as the LAG
is not in a subordinate THEN
clause, the LAG
stacks will be appropriate for the task.
Example:
data have;
attrib
id format=4.
date informat=mmddyy10. format=mmddyy10.
evt_type length=$4
flag length=$1
amt1 amt2 format=4.
;
input
id Date Evt_Type Flag Amt1 Amt2; datalines;
101 2/2/2019 Fee . 5 .
101 2/3/2019 REF1 Y . 5
101 2/4/2019 Fee . 10 .
101 2/6/2019 REF2 Y . 10
101 2/7/2019 Fee . 4 .
101 2/8/2019 REF1 . . .
102 2/2/2019 Fee . 25 .
102 2/2/2019 REF1 N 25 .
103 2/3/2019 Fee . 10 .
103 2/4/2019 REF1 Y . 10
103 2/5/2019 Fee . 10 .
;
data want;
_y_n = 1e12;
do _n_ = 1 by 1 until (last.id);
set have;
by id;
if flag='Y' then _y_n = _n_;
/* rule: post Y output of two rows should only occur once, and at the rows
* immediately succeeding the Y row
*/
if _n_ = _y_n + 2 /* is this row 2 after a Y */
and lag(evt_type) = 'Fee' /* is first row after Y Fee */
and evt_type =: 'REF' /* is second row after Y REF# */
then
_upto_n = _n_;
end;
_upto_n = max (_upto_n, _y_n);
do _n_ = 1 to _n_;
set have;
if _n_ <= _upto_n then OUTPUT;
end;
drop _:;
run;
Note, regarding:
if _n_ = _y_n + 2 /* is this row 2 after a Y */
and lag(evt_type) = 'Fee' /* is first row after Y Fee */
and evt_type =: 'REF' /* is second row after Y REF# */
then
_upto_n = _n_;
For the row 2 after Y,
LAG2(<var>) is the <var> value from the Y row
LAG (<var>) is the <var> value from the Y row+1
<var> is the <var> value from the Y row+2, which is the current row