Search code examples
javaperformanceparsinglexerlexical-analysis

What are the advantages/disadvantages of using ANTLR, JavaCC or JFlex over StringTokenizer and equivalents?


I am currently looking at implementing the language shown here in Java. The presentation is a little long, but it is essentially a DSL for creating dynamic speech. Example:

rule ExampleRule
{
    criteria Criterion1 Criterion2 Criterion3=value
    response ExampleResponse
    remember State:=1
    trigger Object TriggerName
    ApplyFacts "State1:1:0,State2:1:0"
}
response ExampleResponse
{
    say "Text" then object ExampleRule
    say "Text" then any ExampleRule
    say "Text"
    scene "Scenepath"
}

I have looked at the different parser generators such as ANTLR, JavaCC and JFlex, but am wondering whether to just use StringTokenizer/Scanner and roll my own parser, as it is for a hobby project.

I had earlier decided on ANTLR but ran into some issues with ANTLR using full pathnames in the generated source code, and the runtime library seemed a bit heavyweight. I have not been able to find much information comparing the three parser libraries or the inbuilt Java alternatives.

What are the advantages/disadvantages of each parser given the nature of the language?


Solution

  • The advantages of using parser generators:

    1. Correctness by construction. The generated parser accepts exactly the language specified in the grammar, and there are all kinds of CS proofs of that for the various kinds of generators, starting with Knuth 1965. If you roll your own, e.g. recursive descent, you have no immediate proof of that, and no easy way to test it.

    2. Development time. Once you get your head around the generator's foibles, the parser is built about as fast as you can type.

    'Given the nature of the language' isn't all that relevant. Major practitioners made major mistakes in implementing nothing more complex than arithmetic expressions in the 1960s, which is why my point (1) is my point (1).