I want to create a indicator variable, "same_first_two_nearby". That indicates that the first two characters of observations are equal to the nearby observations. I try to use the "duplication method", but I fail. Because the method can only "delete" the duplication but not keep.
PROC SORT data=temp NODUPKEY;
BY customer_IN;
RUN;
The example of my data is as following.
data temp;
input customer_IN $ 1-8 ;
cards;
ADJOHN.
ADMARY.
ADjerry.
BWABBY.
CFLUCY.
CFLINDA.
EFLAGNA.
KTPAKAO.
KTWANDA.
;
run;
proc print data=temp;run;
I want to generate the results as the following.
customer_IN same_first_two_nearby
ADJOHN. 1
ADMARY. 1
ADjerry. 1
BWABBY. 0
CFLUCY. 1
CFLINDA. 1
EFLAGNA. 0
KTPAKAO. 1
KTWANDA. 1
Thanks in advance.
You can do this using a helper column containing the first two characters, provided that it is sorted as per the original question:
data temp;
input customer_IN $ 1-8 initials $ 1-2;
cards;
ADJOHN.
ADMARY.
ADjerry.
BWABBY.
CFLUCY.
CFLINDA.
EFLAGNA.
KTPAKAO.
KTWANDA.
;
run;
data want;
set temp;
by initials;
same_first_two_nearby = not(first.initials and last.initials);
run;