Search code examples
sqlcountsashavingproc-sql

How to group records so the groups contains only specific variables? (proc sql)


I have a database of people with information that in which households they are included. How can I group in one statement households in which lives only woman or only a specific group of people?

Example:

enter image description here

So from this example I would like to get only one row (B) and the count that there lives two people.

To better understand I can do it when I want households where lives only people under 20 years old. The following code works:

proc sql;
    create table new_tab as
    select household, count(*) as freq
    from table
    group by household
    having max(gender) < 21;
quit;

But I couldnt find a similar code when it is not a numeric value. Can someone please give an advise how could I group only households in which lives only woman?


Solution

  • Consider computing an aggregate categorical value and counts that can be used in later analysis.

    Example:

    select household, 
      case 
        when min(gender) ne max(gender) then 'mixed'
        else min(gender)
      end as gender
    , sum (gender eq 'woman') as woman_count
    , sum (gender eq 'man') as man_count
    , sum (missing(gender)) as missingG_count
    , sum (not missing(gender) and gender not in ('woman','man')) as otherG_count
    from have
    group by household