Search code examples
sasindicator

How to identify two rows with same characters at first stream


I want to create a indicator variable, "same_first_two_nearby". That indicates that the first two characters of observations are equal to the nearby observations. I try to use the "duplication method", but I fail. Because the method can only "delete" the duplication but not keep.

PROC SORT data=temp NODUPKEY;
 BY customer_IN;
RUN; 

The example of my data is as following.

data temp;
input  customer_IN $ 1-8 ;
cards;
ADJOHN. 
ADMARY.  
ADjerry.  
BWABBY.   
CFLUCY.   
CFLINDA.   
EFLAGNA.  
KTPAKAO.   
KTWANDA.    
;
run;
proc print data=temp;run;

I want to generate the results as the following.

customer_IN  same_first_two_nearby
    ADJOHN.                      1
    ADMARY.                      1 
    ADjerry.                     1 
    BWABBY.                      0 
    CFLUCY.                      1 
    CFLINDA.                     1 
    EFLAGNA.                     0 
    KTPAKAO.                     1 
    KTWANDA.                     1 

Thanks in advance.


Solution

  • You can do this using a helper column containing the first two characters, provided that it is sorted as per the original question:

    data temp;
    input  customer_IN $ 1-8 initials $ 1-2;
    cards;
    ADJOHN. 
    ADMARY.  
    ADjerry.  
    BWABBY.   
    CFLUCY.   
    CFLINDA.   
    EFLAGNA.  
    KTPAKAO.   
    KTWANDA.    
    ;
    run;
    
    data want;
    set temp;
    by initials;
    same_first_two_nearby = not(first.initials and last.initials);
    run;