Search code examples
mergesascharacternumeric

SAS collapse data by identifiers


I have a bunch of data that I would like to collapse by a few identifier variables and keep only the non-missing values of other variables. For each unique combination of id, title, info there is 1 value of var1/var2/var3 that isn't missing that I would like to keep. Note that var3 is numeric while var1/var2 are character.

I have data like:

id  title info                 var1     var2        var3
1   foo   Some string here     string 1     
1   foo   Some string here              string 2 
1   foo   Some string here                          number 3
2   bar   A different string   string 4 string 5    
2   bar   A different string                        number 6
3   baz   Something else       string 7             number 8

And I want it to be like:

id  title info                 var1     var2        var3
1   foo   Some string here     string 1 string 2   number 3
2   bar   A different string   string 4 string 5   number 6 
3   baz   Something else       string 7            number 8

Thanks!


Solution

  • The UPDATE statement can handle that. The last non-missing value will be used. The UPDATE statement takes exactly two datasets, the original and the transaction. The original dataset must have only one observation per by group. But you can use your single dataset by using the OBS=0 dataset option to create an empty master dataset.

    First here is your sample data.

    data have ;
      infile cards dsd truncover ;
      length id 8 title info var1-var3 $20 ;
      input id -- var3 ;
    cards;
    1,foo,Some string here,string 1,,
    1,foo,Some string here,,string 2,
    1,foo,Some string here,,,number 3
    2,bar,A different string,string 4,string 5,
    2,bar,A different string,,,number 6
    3,baz,Something else,string 7,,number 8
    ;;;;
    

    Here is the step to collapse.

    data want ;
      update have(obs=0) have ;
      by id title info;
    run;