Search code examples
goantlrantlr4

Create mandatory token in ANTLR


I'm just getting into ANTLR. I'm trying to create a simple hello world ANTLR. My goal is to make "Hello world" a mandatory string. I'm thus expecting an input of "Hello" to be considered invalid, and it giving me an error stating it expects a "world" token.

Edit: Please note that I do want "hello" and "world" to be separate tokens (consider them to be keywords) so that I can easily identify them separately.

I have the below helloworld.g4:

grammar helloworld;

WHITESPACE: [ \r\n\t]+ -> skip;
HELLO : 'Hello' ;
WORLD : 'world' ;

start : HELLO WORLD EOF ;

I have the following main.go:

package main

import (
    "fmt"
    "test/parser"

    "github.com/antlr/antlr4/runtime/Go/antlr"
)

const rule = `Hello`

type testListener struct {
    *parser.BasehelloworldListener
}

func main() {
    // Setup the input
    is := antlr.NewInputStream(rule)

    // Create the Lexer
    lexer := parser.NewhelloworldLexer(is)
    // Read all tokens

    for {
        t := lexer.NextToken()
        if t.GetTokenType() == antlr.TokenEOF {
            break
        }
        fmt.Printf("%s (%q)\n",
            lexer.SymbolicNames[t.GetTokenType()],
            t.GetText())
    }

    // Finally parse the expression
    stream := antlr.NewCommonTokenStream(lexer,
        antlr.TokenDefaultChannel)

    // Create the Parser
    p := parser.NewhelloworldParser(stream)

    // Finally parse the expression
    antlr.ParseTreeWalkerDefault.Walk(&testListener{}, p.Start())
}

I'm building a Go parser, and testing the outcome with the below command:

antlr -Dlanguage=Go -o parser helloworld.g4 && go run main.go

Which outputs:

HELLO ("Hello")
line 1:5 mismatched input '<EOF>' expecting 'Hello'

I'm wondering what I can do to give me an output, stating "world" is an expected token after "hello". It shouldn't expect another "Hello", it should expect "world" and then an EOF.


Solution

  • In the lexer you've defined 2 separate tokens, so the lexer has no issues with the input "Hello".

    If that hello-token should always be followed by "world", then you must include that in the token:

    HELLO : 'Hello' ' '+ 'world';
    

    If you invoke the parser rule start, that will result in an error. This is usually the way to enforce the presence of the WORLD token (in the parser), not in the lexer.

    EDIT

    You're consuming all tokens, and then feed this "consumed lexer" to the parser. Skip the printing of the tokens, or re-initialize the lexer after printing the tokens.

    This should do:

    func main() {
        is := antlr.NewInputStream(`Hello`)
    
        lexer := parser.NewhelloworldLexer(is)
    
        stream := antlr.NewCommonTokenStream(lexer,
            antlr.TokenDefaultChannel)
    
        p := parser.NewhelloworldParser(stream)
    
        antlr.ParseTreeWalkerDefault.Walk(&testListener{}, p.Start())
    }