I am trying to define a language using ANTLR4 to generate its parser. While the language is actually a bit more complex, this is a tiny valid example of a file I want the parser to read, which triggers the problem I am trying to fix:
features \\ Keyword which initializes the "features" block
Server
mandatory \\ Relation word
FileSystem
OperatingSystem
optional \\ Relation word
Logging
features word simply starts the block, while mandatory and optional are relation words. The words remaining are just simple words (called features in this context). What I want is to make Server child of features block, then, mandatory and optional both children of Server and finally, FileSystem and OperatingSystem children of mandatory, and Logging child of optional.
The following grammar is my attempt to achieve this structure:
grammar MyGrammar;
tokens {
INDENT,
DEDENT
}
@lexer::header {
from antlr_denter.DenterHelper import DenterHelper
from UVLParser import UVLParser
}
@lexer::members {
class UVLDenter(DenterHelper):
def __init__(self, lexer, nl_token, indent_token, dedent_token, ignore_eof):
super().__init__(nl_token, indent_token, dedent_token, ignore_eof)
self.lexer: UVLLexer = lexer
def pull_token(self):
return super(UVLLexer, self.lexer).nextToken()
denter = None
def nextToken(self):
if not self.denter:
self.denter = self.UVLDenter(self, self.NL, UVLParser.INDENT, UVLParser.DEDENT, True)
return self.denter.next_token()
}
// parser rules
feature_model: features?;
features: 'features' INDENT child;
child: feature_spec INDENT relation* DEDENT;
relation: relation_spec INDENT child* DEDENT;
feature_spec: WORD ('.' WORD)*;
relation_spec: RELATION_WORD;
//lexer rules
RELATION_WORD: ('alternative' | 'or' | 'optional' | 'mandatory');
WORD: [a-zA-Z][a-zA-Z0-9_]*;
WS: [ \n\r]+ -> skip;
NL: ('\r'? '\n' '\t');
I am using antlr-denter in order to manage indent and dedent.
Then, I am defining RELATION_WORD and WORD separately in the lexer.
Finally, the parser rules attempt to construct the structure I described before. I want the features word to be followed by a single child. Then, any child is going to be a feature spec followed by any amount of relations between an INDENT and DEDENT. Same happens with relations being a relation spec followed by a similar set of children, with this loop being repeated indefinitely.
However, I can't manage to make the parser read this structure correctly. With the previous example as input, I am getting mandatory as child of Server, but not optional. Changing the example to this one:
features
Server
mandatory
optional
Logging
Both mandatory and optional are interpreted as children of mandatory. It must have something to do with INDENT and DEDENT interpretation to correctly find blocks, but I have been unable to find a solution so far.
Any ideas to fix this would be very welcome. Thanks in advance!
Try changing your child
and feature
rules as follows:
child: feature_spec (INDENT relation* DEDENT)?;
relation: relation_spec (INDENT child* DEDENT)?;
As @Kaby76 mentions, it's quite helpful to print out the token stream to understand how your parser stream sees the stream of tokens.
I've not used antlr-denter before, but the way it plugs in, it would appear that you're not going to get a token stream just by using the grun
tool.
As a substitute, I tried just making up INDENT and OUTDENT Tokens (I used ->
and <-
, respectively)
revised grammar:
grammar MyGrammar;
// parser rules
feature_model: features?;
features: 'features' INDENT child;
child: feature_spec INDENT relation* DEDENT;
relation: relation_spec INDENT child* DEDENT;
feature_spec: WORD ('.' WORD)*;
relation_spec: RELATION_WORD;
//lexer rules
RELATION_WORD: ('alternative' | 'or' | 'optional' | 'mandatory');
WORD: [a-zA-Z][a-zA-Z0-9_]*;
WS: [ \n\r]+ -> skip;
// Temporary
//NL: ('\r'? '\n' '\t');
NL: ('\r'? '\n' '\t') -> skip;
INDENT: '->';
DEDENT: '<-';
And revised to input file to use the explicit tokens:
features
->Server
->mandatory
optional
->Logging
Just making this change, you'll notice that there are no <-
tokens in your sample.
But, now I can dump the token stream:
➜ grun MyGrammar tokens -tokens < MGIn.txt
[@0,0:7='features',<'features'>,1:0]
[@1,12:13='->',<'->'>,2:3]
[@2,14:19='Server',<WORD>,2:5]
[@3,28:29='->',<'->'>,3:7]
[@4,30:38='mandatory',<RELATION_WORD>,3:9]
[@5,47:48='->',<'->'>,4:7]
[@6,49:56='optional',<RELATION_WORD>,4:9]
[@7,69:70='->',<'->'>,5:11]
[@8,71:77='Logging',<WORD>,5:13]
[@9,78:77='<EOF>',<EOF>,5:20]
Now let's try parsing:
➜ grun MyGrammar feature_model -tree < MGIn.txt
line 4:9 mismatched input 'optional' expecting {WORD, '<-'}
line 5:20 mismatched input '<EOF>' expecting {'.', '->'}
(feature_model (features features -> (child (feature_spec Server) -> (relation (relation_spec mandatory) ->) (relation (relation_spec optional) -> (child (feature_spec Logging))) <missing '<-'>)))
So, your grammar calls for 'mandatory'
(as a RELATION_WORD
) to be followed by an INDENT
as well as a DEDENT
(which isn't present). This makes sense as they don't have any children, So, it seems that the INDENT
/DEDENT
need to be connected to whether there are any children:
Let's change that:
child: feature_spec (INDENT relation* DEDENT)?;
relation: relation_spec (INDENT child* DEDENT)?;
Try again:
➜ grun MyGrammar feature_model -tree < MGIn.txt
➜ grun MyGrammar feature_model -tree < MGIn.txt
line 5:20 extraneous input '<EOF>' expecting {WORD, '<-'}
(feature_model (features features -> (child (feature_spec Server) -> (relation (relation_spec mandatory)) (relation (relation_spec optional) -> (child (feature_spec Logging))) <missing '<-'>)))
Now we're missing a <-
(OUTDENT
) at EOF. The solution to this depends on whether the antlr-denter closes all the INDENT
s at <EOF>
Assuming it does, my fake input should look something like:
features
->Server
->mandatory
optional
->Logging
<-
<-
<-
and, we try again:
grun MyGrammar feature_model -gui < MGIn.txt