Search code examples
sasradixsas-macro

SAS Programming - removing duplicates


I have the following data based on distance between the cities.

Source Destination Distance 
USA       UK        1000 
USA      Spain      200 
UK        USA       1000 
Germany  Spain      500 
Spain     USA       200 

I want to remove the duplicates where source and destination are same. For Example USA to UK will be same as UK to USA and hence the duplicate value needs to be removed.

Following is the desired output.

Source Destination Distance 
USA        UK        1000 
USA      Spain       200 
Germany  Spain       500 

Solution

  • First produce a dummy variable to hold sorted source and destination by call sortc, then sort by dummy variable.

    data have;
    input Source $ Destination $ Distance;
    cards;
    USA       UK        1000 
    USA      Spain      200 
    UK        USA       1000 
    Germany  Spain      500 
    Spain     USA       200 
    ;
    
    data temp;
        set have;
        length dummy $50.;
        _var1=source; _var2=destination;
        call sortc (of _:);
        dummy=catx(' ',of _:);
        drop _:;
    run;
    
    proc sort data=temp out=want(drop=dummy) nodupkey;
    by dummy;
    run;