Search code examples
sasdatastep

Creating a single record from multiple records in SAS


I have a SAS Data set called coaches_assistants with the following structure. There are always only two records per TeamID.

TeamID     Team_City    CoachCode
123        Durham       Head_242
123        Durham       Assistant_876
124        London       Head_876
124        London       Assistant_922
125        Bath         Head_667
125        Bath         Assistant_786
126        Dover        Head_544
126        Dover        Assistant_978
...        ...          ....

What I'd like to do with this is to create a data set with an extra field called AssistantCode and make it look like:

TeamID     Team_City    HeadCode   AssistantCode
123        Durham       242        876
124        London       876        922
125        Bath         667        786
126        Dover        544        978
...        ...          ...        ...

If possible, I'd like to do this in a single DATA step (though I recognize that I might need a PROC SORT step first). I know how to do it in python or ruby or any traditional scripting languages, but I don't know how to do it in SAS.

What's the best way to do this?


Solution

  • Here are two possible solutions (one using a data step as requested and another using PROC SQL):

    data have;
       length TeamID $3 Team_City CoachCode $20; 
       input TeamID $ Team_City $ CoachCode $;
       datalines;
    123        Durham       Head_242
    123        Durham       Assistant_876
    124        London       Head_876
    124        London       Assistant_922
    125        Bath         Head_667
    125        Bath         Assistant_786
    126        Dover        Head_544
    126        Dover        Assistant_978
    run;
    
    /* A data step solution */
    proc sort data=have;
       by TeamID;
    run;
    
    data want1(keep=TeamID Team_City HeadCode AssistantCode);
       /* Define all variables, retain the new ones */
       length TeamID $3 Team_City $20 HeadCode $3 AssistantCode $3; 
       retain HeadCode AssistantCode; 
       set have;
          by TeamID;
       if CoachCode =: 'Head'
          then HeadCode = substr(CoachCode,6,3);
          else AssistantCode = substr(CoachCode,11,3);
       if last.TeamID;
    run;
    
    /* An SQL solution */
    proc sql noprint;
       create table want2 as
       select TeamID
            , max(Team_City) as Team_City
            , max(CASE WHEN CoachCode LIKE 'Head%'
                       THEN substr(CoachCode,6,3) ELSE ' '
                  END) LENGTH=3 as HeadCode
            , max(CASE WHEN CoachCode LIKE 'Assistant%'
                       THEN substr(CoachCode,11,3) ELSE ' '
                  END) LENGTH=3 as AssistantCode
       from have
       group by TeamID;
    quit;
    

    PROC SQL has the advantage of not requiring you to sort the data in advance.