Search code examples
regexlazy-evaluationparenthesesregex-greedy

Skipping sections dynamically in a regular expression


I'm trying to develop a regular expression that will match up to the first period in a sentence- so long as that period is not inside of any parentheses.

So, for example, the string:

Tom (Ed.) went down to the shop where the owners (J. Guys, A. Owner, and B. Ains) gathered. It was a great night.

Should return:

Tom (Ed.) went down to the shop where the owners (J. Guys, A. Owner, and B. Ains) gathered.

However, I find that using a lazy approach, I only get:

Tom (Ed.

And, using a greedy approach, obviously I get the whole sentence. Not all sentences are structured like this (some sentences have no parentheses, for example), and I've tried using negative lookup, but I don't particularly understand it.

Anyone have an idea on how to proceed?


Solution

  • You can use this regex in Java to match the period that is not inside the round brackets:

    (?=([^(]*\([^)]*")*[^)]*$)\.
    

    And to match the whole Tom (Ed.) went down to the shop where the owners (J. Guys, A. Owner, and B. Ains) gathered. sentence, you can use

    .*?(?=([^(]*\([^)]*")*[^)]*$)\.
    

    Have a look at a demo. Also, in Java, you will have to double-escape slashes:

     String pattern = ".*?(?=([^(]*\\([^)]*")*[^)]*$)\\.";