Search code examples
yamlantlr4abstract-syntax-tree

how to deal char ':' and assign(:) in ANTLR4 grammar


I want to parse yaml with antlr4. Target file contains image: xxx.com/node:8.14. Then I wrote a grammar file like this:

grammar Drone;

yaml: obj+ ;

obj: ID ':' value;
value: 
STRING;

ID
    : ('a'..'z'|'A'..'Z') ('a'..'z'|'A'..'Z'|'0'..'9'|'-')+
    ;
STRING: ('a'..'z'|'A'..'Z'|'0'..'9'|'-'|'.'|'_'|'/'|':')+ ;


WS: [ \t]+ -> skip;
CRLF: [\r\n]+ ;

got result like this:

[antlr4] ➜  dronemigrate antlr4-parse Drone.g4 yaml -tree  -trace drone.yml
line 1:0 mismatched input 'image:' expecting ID
enter   yaml, LT(1)=image:
enter   obj, LT(1)=image:
exit    obj, LT(1)=<EOF>
exit    yaml, LT(1)=<EOF>
(yaml:1 (obj:1 image: xxx.com/node:8.14))

When I remove char ':' in Grammar rules file, got reulst like this:

[antlr4] ➜  dronemigrate antlr4-parse Drone.g4 yaml -tree  -trace drone.yml
enter   yaml, LT(1)=image
enter   obj, LT(1)=image
consume [@0,0:4='image',<2>,1:0] rule obj
consume [@1,5:5=':',<1>,1:5] rule obj
enter   value, LT(1)=xxx.com/node
consume [@2,7:18='xxx.com/node',<3>,1:7] rule value
exit    value, LT(1)=:
exit    obj, LT(1)=:
exit    yaml, LT(1)=:
(yaml:1 (obj:1 image : (value:1 xxx.com/node)))

how to deal the ':' in string?


Solution

  • YAML is not well suited for parser generators like ANTLR (where there is a strict separation between the lexer and parser). There are ways around it (using lexical modes), but fully implementing a YAML grammar is by no means a trivial thing. That's why there is still an open issue in ANTLR's issue tracker to implement such a grammar. You could have a look at how it is done here: https://github.com/umaranis/FastYaml