Search code examples
sassum

Sum columns based on several other columns - SAS


I'm trying to sum some columns based on several other columns, and then produce a new table with the results in.

Say I have the following data:

Col1 Col2 Col3 Col4 Col5 Col6
AAAA BBBB CCCC DDDD 3 1
AAAA BBBB CCCC DDDD 5 1
WWWW XXXX YYYY ZZZZ 1 4
WWWW XXXX YYYY ZZZZ 8 2

And I want to sum Col5 and Col6 (separately) where Col 1-4 are the same. i.e. the output I want is:

Col1 Col2 Col3 Col4 Col5 Col6
AAAA BBBB CCCC DDDD 8 2
WWWW XXXX YYYY ZZZZ 9 6

I've put my code below, but its giving me the following:

Col1 Col2 Col3 Col4 Col5 Col6
AAAA BBBB CCCC DDDD 8 2
AAAA BBBB CCCC DDDD 8 2
WWWW XXXX YYYY ZZZZ 9 6
WWWW XXXX YYYY ZZZZ 9 6

Any help would be greatly appreciated to:

a) get this to code work.

b) show me a better (more efficient?) way of doing this? I think I've massively(!) overcomplicated this (I'm very new to SAS!).

--- Code ---

data XXX;
input Col1 $ Col2 $ Col3 $ Col4 $ Col5 Col6;
datalines;
AAAA BBBB CCCC DDDD 3 1
AAAA BBBB CCCC DDDD 5 1
WWWW XXXX YYYY ZZZZ 1 4
WWWW XXXX YYYY ZZZZ 8 2
;
run;

data test1;
set XXX;
groupID = put(md5(upcase(catx('|',Col1,Col2,Col3,Col4))),hex32.);
run;

proc sort data = test1;
by groupID;
run;

proc summary data = test1;
var Col5 Col6;
by groupID;
Output out = want sum=;
run;

proc sql;
create table test1_results as
select b.Col1,b.Col2,b.Col3,b.Col4, a.*
from want as a
left join test1 as b
on a.groupID = b.groupID;
run;

data Final_table;
set test1_results;
Keep Col1 Col2 Col3 Col4 Col5 Col6;
run;

Solution

  • I think you need Proc SUMMARY. The remaining steps are unnecessary. Key concept - BY or CLASS statements take multiple variables.

    data XXX;
        input Col1 $ Col2 $ Col3 $ Col4 $ Col5 Col6;
        datalines;
    AAAA BBBB CCCC DDDD 3 1
    AAAA BBBB CCCC DDDD 5 1
    WWWW XXXX YYYY ZZZZ 1 4
    WWWW XXXX YYYY ZZZZ 8 2
    
    ;
    run;
    
    proc summary data=xxx NWAY noprint;
        class col1 col2 col3 col4;
        var Col5 Col6;
        Output out=want (drop=_type_ _freq_) sum=;
    run;
    
    proc print data=want;run;