parsing antlr antlr4 grammar indentation

How can I define an ANTLR4 indentation block based grammar?

I am trying to define a language using ANTLR4 to generate its parser. While the language is actually a bit more complex, this is a tiny valid example of a file I want the parser to read, which triggers the problem I am trying to fix:

features \\ Keyword which initializes the "features" block
   Server
       mandatory \\ Relation word
           FileSystem
           OperatingSystem
       optional \\ Relation word
           Logging

features word simply starts the block, while mandatory and optional are relation words. The words remaining are just simple words (called features in this context). What I want is to make Server child of features block, then, mandatory and optional both children of Server and finally, FileSystem and OperatingSystem children of mandatory, and Logging child of optional.

The following grammar is my attempt to achieve this structure:

grammar MyGrammar;

tokens {
    INDENT,
    DEDENT
}

@lexer::header {
from antlr_denter.DenterHelper import DenterHelper
from UVLParser import UVLParser
}
@lexer::members {
class UVLDenter(DenterHelper):
    def __init__(self, lexer, nl_token, indent_token, dedent_token, ignore_eof):
        super().__init__(nl_token, indent_token, dedent_token, ignore_eof)
        self.lexer: UVLLexer = lexer

    def pull_token(self):
        return super(UVLLexer, self.lexer).nextToken()

denter = None

def nextToken(self):
    if not self.denter:
        self.denter = self.UVLDenter(self, self.NL, UVLParser.INDENT, UVLParser.DEDENT, True)
    return self.denter.next_token()

}

// parser rules
feature_model: features?;
features: 'features' INDENT child;

child: feature_spec INDENT relation* DEDENT;
relation: relation_spec INDENT child* DEDENT;

feature_spec: WORD ('.' WORD)*;
relation_spec: RELATION_WORD;

//lexer rules

RELATION_WORD: ('alternative' | 'or' | 'optional' | 'mandatory');

WORD: [a-zA-Z][a-zA-Z0-9_]*;

WS: [ \n\r]+ -> skip;
NL: ('\r'? '\n' '\t');

I am using antlr-denter in order to manage indent and dedent.

Then, I am defining RELATION_WORD and WORD separately in the lexer.

Finally, the parser rules attempt to construct the structure I described before. I want the features word to be followed by a single child. Then, any child is going to be a feature spec followed by any amount of relations between an INDENT and DEDENT. Same happens with relations being a relation spec followed by a similar set of children, with this loop being repeated indefinitely.

However, I can't manage to make the parser read this structure correctly. With the previous example as input, I am getting mandatory as child of Server, but not optional. Changing the example to this one:

features
   Server
       mandatory
       optional
           Logging

Both mandatory and optional are interpreted as children of mandatory. It must have something to do with INDENT and DEDENT interpretation to correctly find blocks, but I have been unable to find a solution so far.

Any ideas to fix this would be very welcome. Thanks in advance!

Solution

Try changing your child and feature rules as follows:

child: feature_spec (INDENT relation* DEDENT)?;
relation: relation_spec (INDENT child* DEDENT)?;

Explanation:

As @Kaby76 mentions, it's quite helpful to print out the token stream to understand how your parser stream sees the stream of tokens.

I've not used antlr-denter before, but the way it plugs in, it would appear that you're not going to get a token stream just by using the grun tool.

As a substitute, I tried just making up INDENT and OUTDENT Tokens (I used -> and <-, respectively)

revised grammar:

grammar MyGrammar;

// parser rules
feature_model: features?;
features: 'features' INDENT child;

child: feature_spec INDENT relation* DEDENT;
relation: relation_spec INDENT child* DEDENT;

feature_spec: WORD ('.' WORD)*;
relation_spec: RELATION_WORD;

//lexer rules

RELATION_WORD: ('alternative' | 'or' | 'optional' | 'mandatory');

WORD: [a-zA-Z][a-zA-Z0-9_]*;

WS: [ \n\r]+ -> skip;

// Temporary
//NL: ('\r'? '\n' '\t');
NL: ('\r'? '\n' '\t') -> skip;
INDENT: '->';
DEDENT: '<-';

And revised to input file to use the explicit tokens:

features
->Server
  ->mandatory
    optional
    ->Logging

Just making this change, you'll notice that there are no <- tokens in your sample.

But, now I can dump the token stream:

➜ grun MyGrammar tokens -tokens < MGIn.txt
[@0,0:7='features',<'features'>,1:0]
[@1,12:13='->',<'->'>,2:3]
[@2,14:19='Server',<WORD>,2:5]
[@3,28:29='->',<'->'>,3:7]
[@4,30:38='mandatory',<RELATION_WORD>,3:9]
[@5,47:48='->',<'->'>,4:7]
[@6,49:56='optional',<RELATION_WORD>,4:9]
[@7,69:70='->',<'->'>,5:11]
[@8,71:77='Logging',<WORD>,5:13]
[@9,78:77='<EOF>',<EOF>,5:20]

Now let's try parsing:

➜ grun MyGrammar feature_model -tree < MGIn.txt
line 4:9 mismatched input 'optional' expecting {WORD, '<-'}
line 5:20 mismatched input '<EOF>' expecting {'.', '->'}
(feature_model (features features -> (child (feature_spec Server) -> (relation (relation_spec mandatory) ->) (relation (relation_spec optional) -> (child (feature_spec Logging))) <missing '<-'>)))

So, your grammar calls for 'mandatory' (as a RELATION_WORD) to be followed by an INDENT as well as a DEDENT (which isn't present). This makes sense as they don't have any children, So, it seems that the INDENT/DEDENT need to be connected to whether there are any children:

Let's change that:

child: feature_spec (INDENT relation* DEDENT)?;
relation: relation_spec (INDENT child* DEDENT)?;

Try again:

➜ grun MyGrammar feature_model -tree < MGIn.txt
➜ grun MyGrammar feature_model -tree < MGIn.txt
line 5:20 extraneous input '<EOF>' expecting {WORD, '<-'}
(feature_model (features features -> (child (feature_spec Server) -> (relation (relation_spec mandatory)) (relation (relation_spec optional) -> (child (feature_spec Logging))) <missing '<-'>)))

Now we're missing a <- (OUTDENT) at EOF. The solution to this depends on whether the antlr-denter closes all the INDENTs at <EOF>

Assuming it does, my fake input should look something like:

features
->Server
  ->mandatory
    optional
    ->Logging
    <-
  <-
<-

and, we try again:

grun MyGrammar feature_model -gui < MGIn.txt