Search code examples
regexpattern-matchingnon-greedy

regex matching tutorial


I've been using regex for years, I've read several tutorials and references (emacs regex reference is my bible), but I still have problems understanding matching. Is there a good comprehensive tutorial on regex matching with abundant examples? Can anybody give me a link where I can finally deeply understand regex matching?

Example ot the problem bothering me.

haystack = "[{one, {one, andahalf}},\n {{two, zero}, two},\n {{threezero}, three},\n {four}]"
pattern = "({.+})"

Result is:

{one, {one, andahalf}}
{{two, zero}, two}
{{threezero}, three}
{four}

Now, what is that exactly? Greedy or nongreedy (it's C# Regexp.Matches)?

Why, o why the (nongreedy) result isn't:

{one, {one, andahalf}
{{two, zero}
{{threezero}
{four}

(matching first possible pair of {})

Or (greedy):

{one, {one, andahalf}},\n {{two, zero}, two},\n {{threezero}, three},\n {four}

(maching greatest possible pair of {})

Of course, the actual result is exactly what I need, and I'm very happy that regex reads my mind, but I'd rather that I read his mind :-D So, does anybody have any decent tutorial on regex matching which will help me understand how this match did what it did?


Solution

  • The reason this happened to work is that those patterns are separated by newlines, and by default the dot (in the .+ part of your regex) matches anything but a newline. To change that behaviour, compile the regex with RegexOptions.Singleline set.

    So it's just a coincidence that the braces were correctly balanced during this greedy match.

    A good regex tutorial can be found at http://www.regular-expressions.info.

    By the way, for safety, braces should always be escaped (\{, \}). The .NET regex engine happens to recognize that they can't mean a quantifier in this context, but other engines will fail to compile this regex.