Search code examples
sasproc-sql

Adding rows for missing values when grouping by multiple variables in proc sql


Like the title suggests, I'm trying to add rows for observations whose value is 0 in proc sql.

When I do a frequency check, I get this:

Proc Freq

I do a proc sql to group by my respective categories using this code which fails to account for 0 observations for some combinations show in the proc freq:

proc sql; 
create table count7 as
select distinct status as d_cat, program, count(status) as count_16, 'm_cit' as m_type
from cy16
group by status, program; 
quit; 

which produces: Have

I want to create rows from the Proc Freq that produces rows for those categories that have 0 observations. Can this be achieved in a proc sql or would I need an additional data step too?


Solution

  • You can use the SPARSE option on the TABLES statement to get PROC FREQ to include the empty cells. In PROC SQL you will need to make a shell dataset of all possible combinations.

    Let's get some sample data with at least one empty cell:

    data class;
     set sashelp.class;
     where age in (15 16);
    run;
    

    We can use PROC FREQ to generate counts.

    proc freq data=class;
     tables age*sex/noprint sparse out=count1(drop=percent);
    run;
    

    Or PROC SQL:

    proc sql ;
    create table count2 as
    select b.age,c.sex,coalesce(a.count,0) as COUNT 
      from (select age,sex,count(*) as COUNT 
            from class group by age,sex ) a
      full join 
           ((select distinct age from class) b
           ,(select distinct sex from class) c
           )
      on a.age=b.age and a.sex=c.sex
      order by age,sex
    ;
    quit;
    

    You can also use the CLASS statement and the COMPLETETYPES and NWAY options in PROC SUMMARY.

    proc summary data=class complettypes nway;
     class age sex ;
     output out=count3(drop=_type_ rename=(_freq_=COUNT));
    run;