I would like to add a new column to a dataset called COUNT
. My dataset has a variables subject, tox1
, tox2
, tox3
(character variables) with different values. A subject can appear multiple times in my dataset, I want to count the occurrence of 'ADR'
per subject. What I want to do is create a new variable called COUNT
that counts how many times a subject has a specific value ie 'ADR'
for tox1
, tox2
and tox3
; when a SUBJECT does not have an observation for that specific value, I want COUNT
to have a result of zero.
Here's an example of the dataset I would like (in this example, I want to count every instance of "ADR"
per SUBJECT
as COUNT
).
INPUT subject $ tox1 $ tox2 $ tox3 $;
datalines;
1 ADR c s
1 ADR d f
1 s ADR f
1 a w e
2 d f g
3 e d ADR
3 k ADR f
4 N ADR ADR
4 Q ADR d
4 ADR d f
4 r v h
4 ADR f g
5 g f d
5 r t u
5 g h j
;
SUBJECT TOX1 TOX2 TOX3 COUNT
1 ADR c s 3
1 ADR d f 3
1 s ADR f 3
1 a w e 3
2 d f g 0
3 e d ADR 2
3 k ADR f 2
4 N ADR ADR 5
4 Q ADR d 5
4 ADR d f 5
4 r v h 5
4 ADR f g 5
5 g f d 0
5 r t u 0
5 g h j 0
Here is a data step approach
data have;
INPUT subject $ tox1 $ tox2 $ tox3 $;
datalines;
1 ADR c s
1 ADR d f
1 s ADR f
1 a w e
2 d f g
3 e d ADR
3 k ADR f
4 N ADR ADR
4 Q ADR d
4 ADR d f
4 r v h
4 ADR f g
5 g f d
5 r t u
5 g h j
;
data want (drop = i);
count = 0;
do _N_ = 1 by 1 until (last.subject);
set have;
by subject;
array tox {3};
do i = 1 to dim(tox);
if tox [i] = 'ADR' then count = sum(count, 1);
end;
end;
do _N_ = 1 to _N_;
set have;
output;
end;
run;
Result:
count subject tox1 tox2 tox3
3 1 ADR c s
3 1 ADR d f
3 1 s ADR f
3 1 a w e
0 2 d f g
2 3 e d ADR
2 3 k ADR f
5 4 N ADR ADR
5 4 Q ADR d
5 4 ADR d f
5 4 r v h
5 4 ADR f g
0 5 g f d
0 5 r t u
0 5 g h j