I have a dataset that includes accounts having multiple rows. I would like to keep one per each based on some filtering conditions. The dataset is
Account Model Class Value
1 A 0 1.0
1 B 0 1.0
2 B 0 0.5
3 A 1 0.5
3 A 0 1.0
I would like to keep one row per each account based on these conditions:
The expected output would be
Account Model Class Value
1 A 0 1.0
2 B 0 0.5
3 A 1 0.5
The code is implemented in SAS. What I have thought is to assign a value for those accounts having multiple rows and select only those ones that satisfy the above conditions. However I do not know how to select accounts having only one row. I think that I could have a look only at max Class as it will select regardless Model (generally Class is the highest associated with Model A).
Assuming Class
has a maximum value of 1, you can just sort your data according to your conditions and output the first one.
data have;
infile datalines;
input Account Model $ Class Value;
datalines4;;;;
1 A 0 1.0
1 B 0 1.0
2 B 0 0.5
3 A 1 0.5
3 A 0 1.0
;;;;
run;
proc sort out=stage1 data=have; by account model descending class;run;
data want;
set stage1;
by account model descending class;
if first.account then output;
run;
However I do not know how to select accounts having only one row.
You can use the first.
and last.
data step variables to select accounts having only one occurence. Indeed if first.account and last.account
then it means there is only one observation in the by group.
Basically, the FIRST. assigns the value of one for the first observation in a by group and a value of zero for all other observations in the by group. Keep in mind that in your specific case, the data need to be sorted by account before applying the FIRST. variable.