Search code examples
javaregexgrammar

Matcher and Regex in Java to output what's inside Square Brackets but only once


I am having major trouble with understanding what is going wrong with a regex. To be complete I will lay out some background info.

This program is supposed to implement a SIMPLESEM interpreter. The grammar in concern here is this:

< Expr >    ==>   < Term > {( + | - ) < Term >}

< Term >    ==>   < Factor > {( * | / | % ) < Factor >}

< Factor >  ==>   < Number > | D[< Expr >] | (< Expr >)

< Number >  ==>   0 | (1..9){0..9}

I was provided with this code which is supposed to give me the contents inside the square brackets of a < Factor >, but it didn't work:

 Matcher m;

(m = Pattern.compile("D\\[(.*)").matcher(expr)).find();
        expr = parseExpr(m.group(1));
        (m = Pattern.compile("\\](.*)").matcher(expr)).find();
        expr = m.group(1);

As example input i have this:

jumpt 5, D[0] == 0

The < Factor > concerned here is D[0]. It doesn't work because the function above feeds 0] into parseExpr() which doesn't handle the left over bracket, and it shouldn't. So i switched it to:

(m = Pattern.compile("D\\[(.*)").matcher(expr)).find();
        expr = m.group(1);
        (m = Pattern.compile("\\](.*)").matcher(expr)).find();
        expr = parseExpr(m.group(1));

but this didn't work because of the Matcher/Regex. I believe it outputted an empty string. So then I tried this which just gives me an error that there is no match:

(m = Pattern.compile("D\\[(.*)").matcher(expr)).find();
expr = m.group(1);

if(expr.contains("(.*)")) 
{
    (m = Pattern.compile("\\](.*)").matcher(expr)).find();
}
else
{
    (m = Pattern.compile("\\]").matcher(expr)).find();
}   
    expr = m.group(1);
    expr = parseExpr(expr);

It gives an index out of bounds at the second to last line. Thanks in advance for your help.


Solution

  • There is this part D[ <Expr> ] and ( <Expr> ), which introduces the problem of bracket matching. This is not something Java regex can handle, since it doesn't support recursive regex.

    In this case, regex is only useful for lexing, you need to write a custom parser for your language.