Search code examples
sassas-macro

counting values per subject


I would like to add a new column to a dataset called COUNT. My dataset has a variables subject, tox1 , tox2, tox3 (character variables) with different values. A subject can appear multiple times in my dataset, I want to count the occurrence of 'ADR' per subject. What I want to do is create a new variable called COUNT that counts how many times a subject has a specific value ie 'ADR' for tox1, tox2 and tox3; when a SUBJECT does not have an observation for that specific value, I want COUNT to have a result of zero.

Here's an example of the dataset I would like (in this example, I want to count every instance of "ADR" per SUBJECT as COUNT).

 INPUT subject $ tox1 $ tox2 $ tox3 $;
 datalines;
1   ADR c s
1   ADR d f
1   s ADR f
1   a w e 
2   d f g
3   e d ADR 
3   k ADR f
4   N ADR ADR
4   Q ADR d
4   ADR d f
4   r v h 
4   ADR f g
5   g f d
5   r t u
5   g h j   
;

THIS IS INPUT DATASET

SUBJECT TOX1 TOX2 TOX3 COUNT
1   ADR c s    3
1   ADR d f    3
1   s ADR f    3
1   a w e      3
2   d f g      0 
3   e d ADR    2
3   k ADR f    2 
4   N ADR ADR  5
4   Q ADR d    5
4   ADR d f    5
4   r v h      5
4   ADR f g    5
5   g f d      0
5   r t u      0
5   g h j      0

THIS IS THE REQUIRED OUTPUT FOR COUNT VARIABLE


Solution

  • Here is a data step approach

    data have;
    INPUT subject $ tox1 $ tox2 $ tox3 $; 
    datalines; 
    1 ADR c   s   
    1 ADR d   f   
    1 s   ADR f   
    1 a   w   e   
    2 d   f   g   
    3 e   d   ADR 
    3 k   ADR f   
    4 N   ADR ADR 
    4 Q   ADR d   
    4 ADR d   f   
    4 r   v   h   
    4 ADR f   g   
    5 g   f   d   
    5 r   t   u   
    5 g   h   j   
    ;
    
    data want (drop = i);
       count = 0;
       do _N_ = 1 by 1 until (last.subject);
          set have;
          by subject;
          array tox {3};
          do i = 1 to dim(tox);
             if tox [i] = 'ADR' then count = sum(count, 1);
          end;
       end;
       do _N_ = 1 to _N_;
          set have;
          output;
       end;
    run;
    

    Result:

    count subject tox1 tox2 tox3 
    3     1       ADR  c    s 
    3     1       ADR  d    f 
    3     1       s    ADR  f 
    3     1       a    w    e 
    0     2       d    f    g 
    2     3       e    d    ADR 
    2     3       k    ADR  f 
    5     4       N    ADR  ADR 
    5     4       Q    ADR  d 
    5     4       ADR  d    f 
    5     4       r    v    h 
    5     4       ADR  f    g 
    0     5       g    f    d 
    0     5       r    t    u 
    0     5       g    h    j