Search code examples
regexstatanon-greedy

No Access to Non-Greedy .*?


My text is:

999 blaw blaw blaw1 999 blaw blaw blaw

And I want to choose:

blaw blaw blaw1

Now, I could do this using:

([0-9][0-9][0-9] )(.*?)( [0-9][0-9][0-9])

But the problem is I can't use ".*?" in what I'm using. Replacing (.*?) with ([^0-90-90-9]*) would have worked if I didn't have the 1 replaces by the blaw1!

Any suggestions, I'm using Stata if it is relevant.


Solution

  • Based on the comment by hwnd:

    clear
    set more off
    
    *----- example data -----
    
    input str60 text
    "999 blaw blaw blaw1 999 blaw blaw blaw"
    end
    
    list
    
    *----- what you want -----
    
    gen extract = regexs(2) if regexm(text, "(^[0-9][0-9][0-9] )(.+)( [0-9][0-9][0-9])")
    
    list
    

    Also

    ... regexm(text, "(^[0-9]+ )(.+)( [0-9]+)")
    

    From help regex:

    Regular expression syntax is based on Henry Spencer's NFA algorithm, and this is nearly identical to the POSIX.2 standard. [arguments] may not contain binary 0 (\0).

    Other references are:

    http://www.stata.com/support/faqs/data-management/regular-expressions/

    http://www.ats.ucla.edu/stat/stata/faq/regex.htm