Hello and sorry for the long title name! I am working with some data that has a long text string (some observations have up to ~2000 characters). Within these strings could be a word (AB/CD) that could be anywhere within the string. I am trying to detect AB/CD within the text string and create a binary variable (ABCD_present) if the word appears in the text.
Below is some example data
data test;
length status $175;
infile datalines dsd dlm="|" truncover;
input ID Status$;
datalines;
1|This is example text I am using instead of real data. I am making the length of this text longer to mimic the long text strings of my data AB/CD
2|This is example AB/CD text I am using instead of real data. I am making the length of this text longer to mimic the long text strings of my data
3|This is example text I am using instead of real data. I AB/CD am making the length of this text longer to mimic the long text strings of my data
4|This is example text I am using instead of real data. I am making the length of this text longer to mimic the long text strings of my data
5|This is example text I am using instead of real data. I am making the length of this text longer to mimic the long text strings of my data
6|This is example text I am using instead of real data. I am making the length of this text longer to AB/CD mimic the long text strings of my data
;
run;
Any guidance on this would be lovely! I do not have a ton of experience using long text strings.
Thank you in advance
You can use the find
function.
data want;
set test;
flag_abcd = (find(status, 'AB/CD') > 0);
run;
Status ID flag_abcd
... 1 1
... 2 1
... 3 1
... 4 0
... 5 0
... 6 1