I have data that looks like this:
ID | Year |
---|---|
A | 1989 |
A | 1989 |
A | 1989 |
B | 1990 |
B | 1990 |
C | 1991 |
C | 1992 |
And want to output data with an additional Flag variable such that the first record of a duplicate or higher is 0, the second is 1, etc. based on both ID and Year. Observations that have a unique combination of ID and Year would not be flagged:
ID | Year | Flag |
---|---|---|
A | 1989 | 0 |
A | 1989 | 1 |
A | 1989 | 2 |
B | 1990 | 0 |
B | 1990 | 1 |
C | 1991 | |
C | 1992 |
I am only familiar with proc sort nodupkey as a way to eliminate duplicates, but I want to instead retain all data and add a flagging variable.
Try this
data have;
input ID $ Year;
datalines;
A 1989
A 1989
A 1989
B 1990
B 1990
C 1991
C 1992
;
data want;
do _N_ = 1 by 1 until (last.Year);
set have;
by ID Year;
end;
do flag = 0 to _N_ - 1;
set have;
if _N_ = 1 then flag = .;
output;
end;
run;
Result:
ID Year flag
A 1989 0
A 1989 1
A 1989 2
B 1990 0
B 1990 1
C 1991 .
C 1992 .