I want to create a parser to a document very similar to the following samba configuration file. it has many sections, every section has a header line, which start with [ followed by a keyword section name, e.g. global, share_name, etc., till the end of line. followed the section header line is the parameters for this section. We don't know the end of a section till we reach the beginning of another section new line [.., how can I write a rule for this kind of doc? All antlr examples I found knows exactly when start a section and when to end a section. Thanks a lot!
[global]
netbios name = NETBIOS_NAME
workgroup = WORKGROUP
security = user
[SHARE_NAME]
comment = COMMENT
force create mode = 0770
locking = yes
[printers]
comment = COMMENT
path = /var/spool/samba
browseable = No
Here is my grammar:
grammar SambaConfiguration;
file : global_section
share_name_section
printer_section
EOF
;
global_section
: SECTION_TAG_START GLOBAL_SECTION_TAG (.)* SECTION_TAG_END NEW_LINE
(~SECTION_TAG_START (.)* NEW_LINE)*
;
share_name_section
: SECTION_TAG_START SHARE_NAME_SECTION_TAG (.)* SECTION_TAG_END NEW_LINE
((~SECTION_TAG_START) (.)* NEW_LINE)*
;
printer_section
: SECTION_TAG_START PRINTER_SECTION_TAG (.)* SECTION_TAG_END NEW_LINE
((~SECTION_TAG_START) (.)* NEW_LINE)*
;
SECTION_TAG_START
: '['
;
SECTION_TAG_END
: ']'
;
GLOBAL_SECTION_TAG
: 'global'
;
SHARE_NAME_SECTION_TAG
: 'SHARE_NAME'
;
PRINTER_SECTION_TAG
: 'printer'
;
NEW_LINE :
'\r' ? '\n' | '\r'
;
WHITE_SPACE
: ' ' | '\t'
;
Somehow, it does not work properly. When running in Antlrworks, it gives me the following exception:
problem matching token at 12:19 NoViableAltException('o'@[1:1: Tokens : ( SECTION_TAG_START | SECTION_TAG_END | GLOBAL_SECTION_TAG | SHARE_NAME_SECTION_TAG | PRINTER_SECTION_TAG | NEW_LINE | WHITE_SPACE );])
Thanks.
The error message:
problem matching token at 12:19 NoViableAltException('o'@[1:1: Tokens : ( SECTION_TAG_START | SECTION_TAG_END | GLOBAL_SECTION_TAG | SHARE_NAME_SECTION_TAG | PRINTER_SECTION_TAG | NEW_LINE | WHITE_SPACE );])
means that ANTLR encounters a character, 'o'
, that it cannot create a token for. You probably think it will be matched by the .
in your parser rules, but it doesn't. Inside parser rules, the .
matches any token, while only inside lexer rules it matches any character.
Your lexer only creates the following tokens: SECTION_TAG_START
, SECTION_TAG_END
, GLOBAL_SECTION_TAG
, SHARE_NAME_SECTION_TAG
, PRINTER_SECTION_TAG
, NEW_LINE
and WHITE_SPACE
. So a .
inside a parser rule matches any of these tokens, nothing more.
Unless you're doing this to learn ANTLR, I'd hesitate to use ANTLR for this task. You can do this easier with some built-in string operations and reading the input line-by-line.
Using ANTLR, you could do something similar to this:
grammar T;
parse
: section* EOF
;
section
: header line*
;
header
: SECTION_TAG_START name=text SECTION_TAG_END NEW_LINE
{
System.out.println("name=" + $name.text);
}
;
line
: key=text ASSIGN value=text (NEW_LINE | EOF)
{
System.out.println(" key=`" + $key.text.trim() +
"`, value=`" + $value.text.trim() + "`");
}
;
text
: OTHER+
;
SECTION_TAG_START : '[';
SECTION_TAG_END : ']';
ASSIGN : '=';
NEW_LINE : '\r'? '\n';
OTHER : . /* any other char: must be the last rule! */;
Parsing your example input would print the following to your console:
name=global key=`netbios name`, value=`NETBIOS_NAME` key=`workgroup`, value=`WORKGROUP` key=`security`, value=`user` name=SHARE_NAME key=`comment`, value=`COMMENT` key=`force create mode`, value=`0770` key=`locking`, value=`yes` name=printers key=`comment`, value=`COMMENT` key=`path`, value=`/var/spool/samba` key=`browseable`, value=`No`