My text is:
999 blaw blaw blaw1 999 blaw blaw blaw
And I want to choose:
blaw blaw blaw1
Now, I could do this using:
([0-9][0-9][0-9] )(.*?)( [0-9][0-9][0-9])
But the problem is I can't use ".*?"
in what I'm using. Replacing (.*?)
with ([^0-90-90-9]*)
would have worked if I didn't have the 1
replaces by the blaw1
!
Any suggestions, I'm using Stata if it is relevant.
Based on the comment by hwnd:
clear
set more off
*----- example data -----
input str60 text
"999 blaw blaw blaw1 999 blaw blaw blaw"
end
list
*----- what you want -----
gen extract = regexs(2) if regexm(text, "(^[0-9][0-9][0-9] )(.+)( [0-9][0-9][0-9])")
list
Also
... regexm(text, "(^[0-9]+ )(.+)( [0-9]+)")
From help regex
:
Regular expression syntax is based on Henry Spencer's NFA algorithm, and this is nearly identical to the POSIX.2 standard. [arguments] may not contain binary 0 (\0).
Other references are:
http://www.stata.com/support/faqs/data-management/regular-expressions/