Search code examples
sasteradata-sql-assistant

Selecting one obs from multiple ones based on some conditions


I have a dataset that includes accounts having multiple rows. I would like to keep one per each based on some filtering conditions. The dataset is

Account Model Class Value
1         A     0     1.0
1         B     0     1.0
2         B     0     0.5
3         A     1     0.5
3         A     0     1.0

I would like to keep one row per each account based on these conditions:

  • if an account has only one row, then keep it, regardless class and model.
  • if an account has multiple rows, then keep the account having Model A and Class = 1 or having Model A and Class whatever.

The expected output would be

Account Model Class Value
    1         A     0     1.0
    2         B     0     0.5
    3         A     1     0.5

The code is implemented in SAS. What I have thought is to assign a value for those accounts having multiple rows and select only those ones that satisfy the above conditions. However I do not know how to select accounts having only one row. I think that I could have a look only at max Class as it will select regardless Model (generally Class is the highest associated with Model A).


Solution

  • Assuming Class has a maximum value of 1, you can just sort your data according to your conditions and output the first one.

    data have;
    infile datalines;
    input Account Model $ Class Value;
    datalines4;;;;
    1 A 0 1.0
    1 B 0 1.0
    2 B 0 0.5
    3 A 1 0.5
    3 A 0 1.0
    ;;;;
    run;
    
    proc sort out=stage1 data=have; by account model descending class;run;
    
    data want;
       set stage1;
       by account model descending class;
       if first.account then output;
    run;
    

    However I do not know how to select accounts having only one row.

    You can use the first. and last. data step variables to select accounts having only one occurence. Indeed if first.account and last.account then it means there is only one observation in the by group.

    Basically, the FIRST. assigns the value of one for the first observation in a by group and a value of zero for all other observations in the by group. Keep in mind that in your specific case, the data need to be sorted by account before applying the FIRST. variable.