Search code examples
javaregexregex-greedy

Java Matcher matches() method to match the entire region against the pattern


I have a pattern (\{!(.*?)\})+ that can be used to validate an expression of format {!someExpression} one or more number of times.

I am performing

Pattern.compile("(\\{!(.*?)\\})+").matcher("{!expression1} {!expression2}").matches() to match the entire region against the pattern.

There is a space between expression1 and expression2.

Expected -> false Actual -> true

I tried both greedy and lazy quantifiers but not able to figure out the catch here. Any help is appreciated.


Solution

  • Of course it matches. Your regexp says so. matches() matches the whole string, so you're doing exactly what you are asking. The point is, that regex matches the whole string. Try it in any regex tool.

    Specifically, (.*?) will happily match expression1} {!expression2. Why shouldn't it? You said 'non-greedy' which doesn't do anything unless we're talking about subgroup matching; non-greediness cannot change what is being matched, it only affects, if it matches, how the groups are divided out. Non-greedy does not mean 'magically do what I want you to', however useful that might seem to be. . will match } just as well as x.

    As a general rule if you're using non-greediness you're doing it wrong. It's not a universal rule; if you really know what you're doing (mostly: That you're modifying how backrefs / group matches / find() ends up spacing it out), it's fine. If you're tossing non-greediness in there as you write your regexp that's usually a sign you misunderstand what you're actually writing down.

    Presumably, your intent with the non-greedy operator here is that you do not want it to also consume the } that 'ends' the {!expr} block.

    In which case, just ask for that then: "Consume everything that isn't a }":

    Pattern.compile("(\\{!([^}]*)\\})+").matcher("{!expression1} {!expression2}").matches()
    

    works great.

    If your intent is instead that expressions can also contain {} symbols and that this is a much more convoluted grammar system then your question cannot be answered without a full breakdown of what the grammarsystem entails. Note that many grammars are not 'regular' (that's a specific term that refers to a subset of all imaginable grammars), then it cannot be parsed out with a regular expression. That's what the 'regular' in regular expression refers to: A class of grammars. regexes can be used meaningfully on anything that fits a regular grammar. They are useless for anything that isn't, even if it seems like it could work. Thus, if there is a sizable grammar behind this {expr} syntax, it's possible you need an actual full parser for it.

    As a simple example, java the language is not regular and therefore cannot meaningfully be parsed with regexes (that is: Whatever aim your regex has, I can write a valid java file that the compiler understands which your regex won't).