Search code examples
sasaggregationproc-sqlenterprise-guide4gl

How to create new columns with names of columns with values in descending order in SAS Enterprise Guide / PROC SQL?


I have table in SAS Enterprise Guide like below:

ID   | COL_A | COL_B | COL_C
-----|-------|-------|------
111  | 10    | 20    | 30
222  | 15    | 80    | 10
333  | 11    | 10    | 20
444  | 20    | 5     | 20

Requirements:

  • I need to create new columns: TOP_1, TOP_2, TOP_3 where will be names of columns from the highest value from COL_A, COL_B, COL_C columns to the lowest per ID

  • If for example 2 or more columns have the same highest value take the first under the alphabet.

  • In TOP_1 - name of column with the hihest value per ID In TOP_2 - name of column with the second highest value per ID In TOP_3 - name of column with the third highest value per ID

Desire output:

ID   | COL_A | COL_B | COL_C  | TOP_1  | TOP_2   | TOP_3
-----|-------|-------|--------|--------|---------|--------- 
111  | 10    | 20    | 30     | COL_C  | COL_B   | COL_A
222  | 15    | 80    | 10     | COL_B  | COL_A   | COL_C
333  | 11    | 10    | 20     | COL_C  | COL_A   | COL_B
444  | 20    | 5     | 20     | COL_A  | COL_C   | COL_B

Because:

  • for ID = 111 the highest value is in COL_C, co name "COL_C" going to column "TOP_1", second highest value is in COL_B, so name "COL_B" going to column "TOP_2" and so on...
  • for ID = 444 two columns have the highest value, so we have to use alphabet criteria and in column "TOP_1" is name "COL_A" and name "COL_B is in column "TOP_2"

How can I do that in SAS Enterprise Gude or in PROC SQL ?


Solution

  • First let's convert your listing into an actual dataset.

    data have;
      input ID    COL_A  COL_B  COL_C ;
    cards;
    111   10     20     30
    222   15     80     10
    333   11     10     20
    444   20     5      20
    ;
    

    If you use PROC TRANSPOSE to covert your COL_: into observations.

    proc transpose data=have out=tall;
      by id col_a col_b col_c;
      var col_a col_b col_c;
    run;
    

    You can then sort by descending values (and ascending variable name):

    proc sort;
      by id col_a col_b col_c descending col1 _name_;
    run;
    

    And use another PROC TRANSPOSE to make your new variables:

    proc transpose data=tall out=want(drop=_name_ _label_) prefix=TOP_;
      by id col_a col_b col_c;
      var _name_;
    run;
    

    If the data is really large (or you have a lot more than 3 columns to check) you might want to eliminate COL_A COL_B and COL_C from the BY group and instead just merge the resulting TOP_: variable back onto the original dataset.