Hello I am trying to generate a group identifier variable in SAS that is observable for the first member of the group and is missing for all subsequent members of that same group. Ideally I would also like it to enumerate the groups as it goes down. For example, I would like to create VAR3 out of VAR1
I know that is STATA something like this could be very easily done as follows:
egen VAR3=group(VAR1)
replace VAR3=. if VAR1[_n]==VAR1[_n-1]
But I am at a lost as to how to do it in SAS. I know that you can do something similar to it doing this:
data zzz; set yyy;
g_obs=_n_; run;
Proc sql; create table X as
select a.VAR1, a.VAR2, case where b.VAR1 ne a.VAR1 then a.VAR1 else . end as VAR3
from zzz a left join zzz b where a.VAR1=b.VAR1 and a.g_obs=(b.g_obs-1); quit;
The problem is that this doesn't get me that enumeration.
Is there anyone who knows how to do this?
Thank you in advance.
Considering your example:
data have;
infile datalines4 delimiter=",";
input var1 $ var2;
datalines4;
a,1
a,2
b,3
b,5
b,45
c,23
c,7
;;;;
The following get you the desired output:
data want;
retain _id_ 1;
set have;
by var1 notsorted;
if first.var1 = 1 then do;
var3 = _id_;
_id_ + 1;
end;
else var3=.;
drop _id_;
run;
The retain _id_ 1
statement causes the _id_
to retain its value (set to one initially) from one iteration of the data step to the next. Then, making use of the FIRST.variable
it is possible to compute the group identifier. Basically, the FIRST.variable assigns the value of one for the first observation in a by group and a value of zero for all other observations in the by groups. Keep in mind that the data need to be sorted by group before applying the FIRST.variable.
Hence, if the data is not already sorted by var1
like it is actually the case in the example, consider adding a proc sort
by var1
before the data step otherwise an error will be thrown.
Result (want
):