Search code examples
javaparsingbbcodejavacc

Why does my JavaCC parser not parse tokens smaller than 2 characters?


I'm working on a JavaCC parser that should parse BBcodes.

My Javacc source code: patebin.com (Junit test: here)

The source code kind off works, but it does not want to accept tokens with a single character, only multi character strings are recognized.

It does parse this string:

"test[b]bold[/b]nothing[b]bold[/b]after"

But not:

"t[b]bold[/b]nothing[b]bold[/b]after"

I’m kind of lost here, any tips welcome here.


Solution

  • I figured it out. Downloaded JavaCC and compiled everything. With single character input, the output is:

    String: t
    Length: 1
    Call:   parse
      Call:   body
      Return: body
    Return: parse
    Exception in thread "main" ParseException: Encountered " <LETTER> "t "" at line
    1, column 1.
    Was expecting one of:
        <EOF>
        "[b]" ...
        "[i]" ...
        "[u]" ...
        "[s]" ...
        "[url]" ...
        "[url=" ...
        "[img]" ...
        "[quote]" ...
        "[code]" ...
        "[color=" ...
        "[br]" ...
        <EOL> ...
        <TEXT> ...
        <TAGCHAR> ...
    

    I noticed that it found a <LETTER> token but didn't recognize it as <TEXT>.

    That's where the problem lies. You've declared everything as tokens and based on the ordering of the token definitions, the string "t" is a <LETTER> first, not <TEXT>. Move the <LETTER> token after <TEXT> and it should work now. You'll want to apply the same changes for <DIGIT>s and other such tokens.