I want to parse yaml with antlr4.
Target file contains image: xxx.com/node:8.14
.
Then I wrote a grammar file like this:
grammar Drone;
yaml: obj+ ;
obj: ID ':' value;
value:
STRING;
ID
: ('a'..'z'|'A'..'Z') ('a'..'z'|'A'..'Z'|'0'..'9'|'-')+
;
STRING: ('a'..'z'|'A'..'Z'|'0'..'9'|'-'|'.'|'_'|'/'|':')+ ;
WS: [ \t]+ -> skip;
CRLF: [\r\n]+ ;
got result like this:
[antlr4] ➜ dronemigrate antlr4-parse Drone.g4 yaml -tree -trace drone.yml
line 1:0 mismatched input 'image:' expecting ID
enter yaml, LT(1)=image:
enter obj, LT(1)=image:
exit obj, LT(1)=<EOF>
exit yaml, LT(1)=<EOF>
(yaml:1 (obj:1 image: xxx.com/node:8.14))
When I remove char ':' in Grammar rules file, got reulst like this:
[antlr4] ➜ dronemigrate antlr4-parse Drone.g4 yaml -tree -trace drone.yml
enter yaml, LT(1)=image
enter obj, LT(1)=image
consume [@0,0:4='image',<2>,1:0] rule obj
consume [@1,5:5=':',<1>,1:5] rule obj
enter value, LT(1)=xxx.com/node
consume [@2,7:18='xxx.com/node',<3>,1:7] rule value
exit value, LT(1)=:
exit obj, LT(1)=:
exit yaml, LT(1)=:
(yaml:1 (obj:1 image : (value:1 xxx.com/node)))
how to deal the ':' in string?
YAML is not well suited for parser generators like ANTLR (where there is a strict separation between the lexer and parser). There are ways around it (using lexical modes), but fully implementing a YAML grammar is by no means a trivial thing. That's why there is still an open issue in ANTLR's issue tracker to implement such a grammar. You could have a look at how it is done here: https://github.com/umaranis/FastYaml