Search code examples
sassas-macro

COUNTING VALUE PER PARTCIPANTS


I would like to add a new column to a dataset but I am not sure how to do so. My dataset has a variable called KEYVAR (character variable) with three different values. A participant can appear multiple times in my dataset, with each row containing a similar or different value for KEYVAR. What I want to do is create a new variable call NEWVAR that counts how many times a participant has a specific value for KEYVAR; when a participant does not have an observation for that specific value, I want NEWVAR to have a result of zero.

Here's an example of the dataset I would like (in this example, I want to count every instance of "Y" per participants as newvar):

have

PARTICIPANT KEYVAR  
A   Y   
A   N   
B   Y   
B   Y   
B   Y   
C   W   
C   N   
C   W   
D   Y   
D   N   
D   N   
D   Y   
D   W   

want

PARTICIPANT KEYVAR  NEWVAR
A   Y   1
A   N   1
B   Y   3
B   Y   3
B   Y   3
C   W   0
C   N   0
C   W   0
D   Y   2
D   N   2
D   N   2
D   Y   2
D   W   2

Solution

  • You can use Proc SQL to compute an aggregate result over a group meeting a criteria, and have that aggregate value automatically merged into the result set.

    -OR-

    Use a MEANS, TRANSPOSE, MERGE approach

    Sample Code (SQL)

    data have;
    input ID $ value $; datalines;
    A   Y
    A   N
    B   Y
    B   Y
    B   Y
    C   W
    C   N
    C   W
    D   Y
    D   N
    D   N
    D   Y
    D   W
    E   X   
    ;
    
    proc sql;
      create table want as
      select ID, value
      , sum(value='Y') as Y_COUNT  /* relies on logic eval 'math' 0 false, 1 true */
      , sum(value='N') as N_COUNT
      , sum(value='W') as W_COUNT    
      from have
      group by ID
      ;
    

    enter image description here

    Sample Code (PROC and MERGE)

    * format for PRELOADFMT and COMPLETETYPES;
    proc format;
      value $eachvalue
        'Y' = 'Y'
        'N' = 'N'
        'W' = 'W'
        other = '-';
      ;
    run;
    
    * Count how many per combination ID/VALUE;
    proc means noprint data=have nway completetypes;
      class ID ;
      class value / preloadfmt;
      format value $eachvalue.;
      output out=freqs(keep=id value _freq_);
    run;
    
    
    * TRANSPOSE reshapes to wide (across) data layout, one row per ID;
    proc transpose data=freqs suffix=_count out=counts_across(drop=_name_);
      by id;
      id value;
      var _freq_;
      where put(value,$eachvalue.) ne '-';
    run;
    
    * MERGE;
    data want_way_2;
      merge have counts_across;
      by id;
    run;
    

    enter image description here