How to enable a start condition at the beginning of a rule and disable it at the end ? I have to ignore whitespace with some bison rules only. How to ignore whitespace inside nested brackets.
define_directive:
DEFINE '(' class_name ')'{ ... }
;
I'm trying to write a parser for this sample code with some more rules.
@/*
* @Template Family
* @Description sample script template for Mate Programming language
* (multi-line comment)
*/
@namespace(sample)
@require(String fatherName)
@require(String motherName)
@require(Array childrenNames)
@define(Family : Template) @// end of header anything can go in body section below (comment)
Family Description
==================
Father's Name: @(fatherName)
Mother's Name: @(motherName)
Number of child: @(childrenNamesCount,0) @// valuation operator is null safe (comment)
List of children's names
------------------------
@foreach(childName:childrenNames)
> @(childName)
@empty
> there is no child name to display.
@end
@@(varName) @// this should not be interpreted because escaped with @ (comment)
Lexer and parser partially implemented. My problem is how to deal with whitespace inside statement keywords like @foreach, @require. Whitespaces should be ignored for these.
desired sample output
Family Description
==================
Father's Name: Mira
Mother's Name: James
Number of child: 0
List of children's names
------------------------
> there is no child name to display.
@@(varName)
bison file content
command:
fileword
| valuation
| alternative
| loop
| command_directive
;
fileword:
tokenword { scriptlangy_echo(yytext,"fileword.tokenword"); }
| MAGICESC { scriptlangy_echo("@","fileword.MAGICESC"); }
;
tokenword:
IDENTIFIER | NUMBER | STRING_LITERAL | WHITESPACE
| INC_OP | DEC_OP | AND_OP | OR_OP | LE_OP | GE_OP | EQ_OP | NE_OP | L_OP | G_OP
| ';' | ',' | ':' | '=' | ']' | '.' | '&' | '[' | '!' | '~' | '-' | '+' | '*' | '/' | '%' | '^' | '|' | ')' | '}' | '?' | '{' | '('
;
valuation:
'@' '(' expression ')' {
fprintf(yyout, "<val>");
}
| '@' '(' expression ',' default_value ')' {
fprintf(yyout, "<val>");
}
;
loop:
for_loop
| foreach_loop
| while_loop
;
while_loop:
WHILE '(' expression ')' end_block
| WHILE '(' expression ')' commands end_block
;
for_loop:
FOR '(' expression_statement expression_statement expression')' end_block
| FOR '(' expression_statement expression_statement expression')' commands end_block
;
foreach_loop:
foreach_block end_block
| foreach_block empty_block end_block
;
foreach_block:
FOREACH '(' IDENTIFIER ')'
| FOREACH '(' IDENTIFIER ':' expression')' commands
;
The key part of your question seems to be this:
I have to ignore whitespace with some bison rules only. How to ignore whitespace inside nested brackets.
As I remarked in comments, your implementation idea of somehow doing this by having your parser rules manipulate scanner start conditions is pretty much a non-starter. Forget about that.
Since evidently your scanner does not, in general, ignore whitespace, it must emit tokens that represent whitespace, or perhaps tokens that represent something else plus whitespace (ugly). If it emits whitespace tokens then the thing to do is simply to account for them in your grammar rules. This is completely possible. In fact, you can build a parser for any context-free language on top of a scanner that just returns every character as its own token. The scanner / parser dichotomy is a functional and conceptual convenience, not a necessity.
For example, then, suppose we want to be able to parse numeric array literals, formed as a nonempty, comma-delimited list of decimal numbers enclosed in curly braces, with optional whitespace around commas and inside the braces. Suppose further that we have these terminal symbols to work with:
OPEN // open brace
CLOSE // close brace
NUM // maximal sequence of one or more decimal digits
COMMA // a comma
WS // a maximal run of whitespace
We might then write these rules:
array: array_start array_elements CLOSE;
array_start: OPEN
| OPEN WS
;
array_elements: array_element
| array_elements array_separator array_element
;
array_element: NUM
| NUM WS
;
array_separator: COMMA
| COMMA WS
;
There are, of course, many other ways to set up the details, but, generally speaking, this is how you handle whitespace with parser rules: not by ignoring it, but by accepting it.