Here is a starting point for a grammar:
%%{
machine xo;
char = "x" | "o";
group = "(" char* ")";
main := group;
}%%
It handles (xxxx(oo)()xx)
, for example. How do I extend it to allow nested groups; e.g. (xxxx(o(x)o)()xx
?
I know recursion is not, generally, supported by one Ragel machine. So this won't work:
group = "(" ( char | group )* ")";
From the Ragel State Machine Compiler User Guide (PDF): (bold text added for emphasis):
"In general Ragel cannot handle recursive structures because the grammar is interpreted as a regular language. However, depending on what needs to be parsed it is sometimes practical to implement the recursive parts using manual coding techniques. This often works in cases where the recursive structures are simple and easy to recognize, such as in the balancing of parentheses."
"One approach to parsing recursive structures is to use actions that increment and decrement counters or otherwise recognize the entry to and exit from recursive structures and then jump to the appropriate machine defnition using fcall and fret. Alternatively, semantic conditions can be used to test counter variables.
"A more traditional approach is to call a separate parsing function (expressed in the host lan- guage) when a recursive structure is entered, then later return when the end is recognized."
From a mailing list discussion on nested brackets, the same three approaches are mentioned:
Specify a growable stack using prepush and postpop and then use fcall and fret.
Counting and then verify in actions or conditions.
Call a new parsing function (in the host language) when the recursive structure is entered.
Can you point me to an example of each -- preferably using my example above -- in Ruby? Thanks!
A general pattern using fcall/fret is like:
balanced = [^(){}\[\]] |
'(' @{ fcall balancedTokensParen; } |
'[' @{ fcall balancedTokensBracket; } |
'{' @{ fcall balancedTokensBrace; };
balancedTokensParen := balanced* ')' @{ fret; };
balancedTokensBracket := balanced* ']' @{ fret; };
balancedTokensBrace := balanced* '}' @{ fret; };
So your case could be handled as
char = [xo];
group = '(' @{ fcall group_rest; };
group_rest := (char|group)* ')' @{ fret; };
main := group;
the lexer function should contain the stack
array, and you have to manually to check the top
to ensure there isn't unclosed '(':
stack = []
%% write init;
%% write exec;
if top > 0
cs = %%{ write error; }%%
end