Search code examples
sasaggregateproc-sqlenterprise-guide4gl

How to aggretage col1 per ID and val1 per ID and values in col1 in SAS Enterprise Gude or PROC SQL?


I have table in SAS Enterprise Guide like below:

ID  | COL1 | VAL1 |  
----|------|------|
111 | A    | 10   | 
111 | A    | 5    | 
111 | B    | 10   | 
222 | B    | 20   |
333 | C    | 25   |
... | ...  | ...  | 

And I need to aggregate above table to know:

  • sum of values from COL1 per ID
  • sum of values from VAL1 per COL1 per ID

So, as a result I need something like below:

ID  | COL1_A | COL1_B | COL1_C  | COL1_A_VAL1_SUM | COL1_B_VAL1_SUM | COL1_C_VAL1_SUM
----|--------|--------|---------|-----------------|-----------------|------------------
111 | 2      | 1      | 0       | 15              | 10              | 0
222 | 0      | 1      | 0       | 0               | 20              | 0
333 | 0      | 0      | 1       | 0               | 0               | 25

for example because:

  • COL1_A = 2 for ID 111, because ID=111 has 2 times "A" in COL1
  • COL1_A_VAL1_SUM = 15 for ID 111, because ID=111 has 10+5=15 in VAL1 for "A" in COL1

How can I do that in SAS Enterpriuse Guide or in PROC SQL ?


Solution

  • First, we'll create the counts that we need by group with SQL:

    proc sql;
        create table totals_by_group as
            select id
                 , col1
                 , count(col1) as count_col1
                 , sum(val1)   as sum_val1
            from have
            group by id, col1
        ;
    quit;
    

    This produces the following table:

    id  col1    count_col1  sum_val1
    111 A       2           15
    111 B       1           10
    222 B       1           20
    333 C       1           25
    

    Now we need to transpose this into the way we want it. We'll do this with two transpose steps: one for count_col1, and one for sum_val1. proc transpose has a few handy options to make this easy, namely the id, prefix, and suffix options.

    First, we'll consider our ID variable col1. This creates columns named A, B, and C. For example:

    id  A   B   C
    111 2   1   .
    222 .   1   .
    333 .   .   1
    

    The prefix and suffix options let us add a prefix and suffix to these names.

    proc transpose 
        data   = totals_by_group 
        out    = count_by_group(drop=_NAME_)
        prefix = COL1_;
    
        by id;
        id col1;
        var count_col1;
    run;
    
    proc transpose 
        data   = totals_by_group 
        out    = sum_by_group(drop=_NAME_)
        prefix = COL1_ 
        suffix = _VAL1_SUM;
    
        by id;
        id col1;
        var sum_val1;
    run;
    

    This gives us two tables:

    COUNT_BY_GROUP 
    id  COL1_A  COL1_B  COL1_C
    111 2       1       .
    222 .       1       .
    333 .       .       1
    
    SUM_BY_GROUP
    id  COL1_A_VAL1_SUM COL1_B_VAL1_SUM COL1_C_VAL1_SUM
    111 15              10              .
    222 .               20              .
    333 .               .               25
    

    Now we just need to merge them together, then set all missing values to 0 by iterating over each numeric column and checking if it's missing.

    data want;
        merge count_by_group  
              sum_by_group
        ;
        by id;
    
        array numvars[*] _NUMERIC_;
    
        do i = 1 to dim(numvars);
            if(missing(numvars[i])) then numvars[i] = 0;
        end;
    
        drop i;
    run;
    

    Final table:

    id  COL1_A  COL1_B  COL1_C  COL1_A_VAL1_SUM COL1_B_VAL1_SUM COL1_C_VAL1_SUM
    111 2       1       0       15              10              0
    222 0       1       0       0               20              0
    333 0       0       1       0               0               25