Search code examples
sasproc-sqldatastep

Select many columns and other non-continuous columns to find duplicate?


I have a dataset with many columns like this:

ID  Indicator Name   C1 C2 C3....C90
 A    0001    Black   0  1  1.....0
 B    0001    Blue    1  0  0.....1
 B    0002    Blue    1  0  0.....1

Some of the IDs are duplicates because the indicator is different, but they're essentially the same record. To find duplicates, I want to select distinct ID, Name and then C1 through C90 to check because some claims who have the same Id and indicator have different C1...C90 values.

Is there a way to select c1...c90 either through proc sql or a sas data step? It seems the only way I can think of is to set the dataset and then drop the non essential columns, but in the actual dataset, it's not only Indicator but at least 15 other columns.


Solution

  • It would be nice if PROC SQL used the : variable name wildcard like other Procs do. When no other alternative is reasonable, I usually use a macro to select bulk columns. This might work for you:

    %macro sel_C(n);
        %do i=1 %to %eval(&n.-1);
            C&i.,
        %end;
        C&n.
    %mend sel_C;
    proc sql;
        select ID,
               Indicator,
               Name,
               %sel_C(90)
        from have_data;
    quit;