Search code examples
antlr4grammarkeywordskip

How to skip input according to keywords in ANTLR4


I am new to antlr4 and wonder if it can do what I am looking for. Here is an example input:

There is a lot of text 
in this file that i do not care 
about
Lithium 20 g/ml
Bor that should be skipped
Potassium  300g/ml
...

and code:

SempredParser.g4

parser grammar SempredParser;
options { tokenVocab=SempredLexer ;}

file        : line+ EOF;
line        : KEYWORD (NUM UNIT)+ '\n'+;

SempredLexer.g4:

lexer grammar SempredLexer;

//lexer rules

KEYWORD     : ('Lithium' | 'Potassium' ) ;
NL          : '\n';
NUM         : [0-9]+ ('.'[0-9]+)? ;
UNIT        : 'g/ml';
UNKNOWN     : . -> skip ;

I would like to skip all the lines that do not contain a KEYWORD (I have around 100 KEYWORDS). Note that I only use '\n' as delimiter here and would ideally not have it parsed to the output.

I read about Island grammars in the Definitive guide and also tried using lexer modes but could not make it work that way. Any hints and help greatly appreciated.


Solution

  • You are pretty close, just avoid to define a linebreak token twice. This grammar works for me (I put it into a combined grammar file):

    grammar IslandTest;
    
    start: NL+ line+ EOF;
    line:  KEYWORD (NUM UNIT)+ NL+;
    
    KEYWORD: ('Lithium' | 'Potassium');
    NUM:     [0-9]+ ('.' [0-9]+)?;
    UNIT:    'g/ml';
    
    NL:      '\n';
    UNKNOWN: . -> skip;
    

    With your input that gives me this parse tree:

    enter image description here

    Note also: you cannot avoid the NL token in your output, because you decided to make your line parse rule line based, which requires the newline token.