input VAR1 VAR2
A1 1
A2 0
A3 1
A4 1
A5 1
A6 1
A7 1
A8 1
A9 1
A10 1
A15 1
B7 0
A1 0
A16 1
A17 1
A18 1
A19 1
A20 0
A21 1
end
Say you have data such as the ones shown. I have VAR1
and wish to create from it VAR2
which takes values 1 if VAR1
contains at the beginning: A1, A3-A10, A15-A19, A21 or if not then it is zero. I believe for this you can use strpos(VAR1)
but is it possible to say for example: strpos(VAR1, "A1, A3/A10, A15/A19, A21")
?
The following works if you have a small number of strings of interest. You may need an alternate approach if you are searching for a larger number of strings, where writing out the ranges of strings (e.g. A3-A10) is unfeasible.
clear
input str3 VAR1 VAR2
A1 1
A2 0
A3 1
A4 1
A5 1
A6 1
A7 1
A8 1
A9 1
A10 1
A15 1
B7 0
A1 1
A16 1
A17 1
A18 1
A19 1
A20 0
A21 1
end
gen wanted = 0
local mystrings = "A1 A3 A4 A5 A6 A7 A8 A9 A10 A15 A16 A17 A18 A19 A21"
foreach string in `mystrings' {
replace wanted = 1 if strpos(VAR1, "`string'") == 1
}
assert wanted == VAR2
Note that in your example input, the second occurrence of A1 had a value of 0 but should have a value of 1 according to your post.
Here is a more generalisable solution for larger ranges of strings:
gen A = 0
replace A = 1 if strpos(VAR1,"A") == 1
gen newvar = substr(VAR1,2,.)
destring newvar, replace
gen wanted = 0
replace wanted = 1 if A == 1 & (inlist(newvar,1,21) | inrange(newvar,3,10) | inrange(newvar,15,19))
assert wanted == VAR2