I have a SAS Data set called coaches_assistants
with the following structure. There are always only two records per TeamID
.
TeamID Team_City CoachCode
123 Durham Head_242
123 Durham Assistant_876
124 London Head_876
124 London Assistant_922
125 Bath Head_667
125 Bath Assistant_786
126 Dover Head_544
126 Dover Assistant_978
... ... ....
What I'd like to do with this is to create a data set with an extra field called AssistantCode
and make it look like:
TeamID Team_City HeadCode AssistantCode
123 Durham 242 876
124 London 876 922
125 Bath 667 786
126 Dover 544 978
... ... ... ...
If possible, I'd like to do this in a single DATA step (though I recognize that I might need a PROC SORT step first). I know how to do it in python or ruby or any traditional scripting languages, but I don't know how to do it in SAS.
What's the best way to do this?
Here are two possible solutions (one using a data step as requested and another using PROC SQL):
data have;
length TeamID $3 Team_City CoachCode $20;
input TeamID $ Team_City $ CoachCode $;
datalines;
123 Durham Head_242
123 Durham Assistant_876
124 London Head_876
124 London Assistant_922
125 Bath Head_667
125 Bath Assistant_786
126 Dover Head_544
126 Dover Assistant_978
run;
/* A data step solution */
proc sort data=have;
by TeamID;
run;
data want1(keep=TeamID Team_City HeadCode AssistantCode);
/* Define all variables, retain the new ones */
length TeamID $3 Team_City $20 HeadCode $3 AssistantCode $3;
retain HeadCode AssistantCode;
set have;
by TeamID;
if CoachCode =: 'Head'
then HeadCode = substr(CoachCode,6,3);
else AssistantCode = substr(CoachCode,11,3);
if last.TeamID;
run;
/* An SQL solution */
proc sql noprint;
create table want2 as
select TeamID
, max(Team_City) as Team_City
, max(CASE WHEN CoachCode LIKE 'Head%'
THEN substr(CoachCode,6,3) ELSE ' '
END) LENGTH=3 as HeadCode
, max(CASE WHEN CoachCode LIKE 'Assistant%'
THEN substr(CoachCode,11,3) ELSE ' '
END) LENGTH=3 as AssistantCode
from have
group by TeamID;
quit;
PROC SQL has the advantage of not requiring you to sort the data in advance.