Suppose I have a dataset with three variables:
ID Year Status
1 2017 Y
1 2017 N
1 2018 N
1 2018 Y
2 2017 Y
2 2017
2 2018 N
2 2018 N
I would like to create a fourth column called NEW which has three possible values ('Yonly' 'Nonly' and 'yesno'). In the example above the output will be:
ID Year Status NEW
1 2017 Y
1 2017 N yesno
1 2018 N
1 2018 Y yesno
2 2017 Y
2 2017 yesonly
2 2018 N
2 2018 N noonly
Note: could have missing data. My solution so far is wrong:
retain tmp '';
by ID Year;
if Status='Y' then tmp='Yonly';
if Status='N' then tmp='Nonly';
if tmp='Yonly' and Status='N' then tmp='yesno';
if tmp='Nonly' and Status='Y' then tmp='yesno';
if last.Year=1 then NEW=tmp;
Please help? Any method will do, you don't have to use RETAIN.
Make sure to define the length of TMP. Your current code will set the length of TMP to 1 since the first usage is the initial value listed in the RETAIN statement.
You are missing an initialization step for when starting a new group.
if first.year then tmp=' ';
Your method can only set the result on the last record for each group. If you want all observations in a group to have the same value then I would suggest using a double DOW loop. The first loop can be used to see if there are any 'Y' or 'N' status. Then you can calculate your NEW variable. Then a second loop will read in the data for the group again and write the values out. Because all observations for a group are processed in a single data step iteration there is no need to use RETAIN.
data want ;
do until (last.year) ;
set have ;
by id year ;
y = y or (status='Y');
n = n or (status='N');
end;
length new $8;
if Y and N then new='yesno';
else if Y then new='yesonly';
else if N then new='noonly';
else new='none';
drop y n ;
do until (last.year) ;
set have ;
by id year ;
output ;
end;
run;