Search code examples
javaparsingbbcodejavacc

How to match optional open/close tags in JavaCC?


What JavaCC syntax does implement grammar that can parse these kind of lines:

[b]content[/b]
content[/b]
[b]content

Although the JavaCC parser needs to parse all lines, it must distinguish correct and incorrect tagging behavior.

Correct tags are like the 1st line, they have an open and close tag. When the tags are matched this will output a bold formated text.

Incorrect tags are like line's 2 and 3, they have no matching open or close tag. When these occure, they are written to the output as-is and will not be interpreted as tags.

I have tried the JavaCC code below (LOOKAHEAD = 999999). Problem is, this syntax will always match everything as invalidTag() instead of bold(). How can I make sure that the JavaCC parser will match bold() when ever possible?

String parse() :
{}
{
    body() <EOF>
    { return buffer; }
}

void body() :
{}
{
    (content())*
}

void content() :
{}
{ 
    (text()|bold()|invalidTag)
}

void bold() :
{}
{
    { buffer += "<b>";  }
    <BOLDSTART>(content())*<BOLDEND>
    { buffer += "</b>"; }
}

void invalidTag() :
{
}
{
    <BOLDSTART> | <BOLDEND>
    { // todo: just output token
    }
}

TOKEN :
{
    <TEXT : (<LETTER>|<DIGIT>|<PUNCT>|<OTHER>)+ >
    |<BOLDSTART : "[b]" >
    |<BOLDEND : "[/b]" >

    |<LETTER : ["a"-"z","A"-"Z"] >
    |<DIGIT : ["0"-"9"] >
    |<PUNCT : [".", ":", ",", ";", "\t", "!", "?", " "] >
    |<OTHER : ["*", "'", "$", "|", "+", "(", ")", "{", "}", "/", "%", "_", "-", "\"", "#", "<", ">", "=", "&", "\\"]     >
}

Solution

  • Your grammar is ambiguous. This is probably not your fault, as it will probably be very difficult to produce an unambiguous grammar for the problem you are trying to solve.

    An LL(k) parser is probably not the best tool for this job.

    However, the tokenizer may be useful, and using a stack to find matching and unmatching pairs of tags may be a suitable alternative.