I do not understand what Ragel considers a "final" state. IIRC the User's Guide says that states that are final before machine simplification remain final thereafter. When exactly is a state final, and how does one recognize this?
Application:
I'm using the state machine syntax to implement a string finder -- find ASCII strings with length greater than n, and print them. This means implementing a maximum length matcher, as below.
Despite the fact that the dot output shows no final states, the EOF transitions behave differently depending on which flavor of {$%@}eof
is used. I do not understand why this should be. For example, in the has_string
state below, using %eof
instead of @eof
causes both the commit_nonstring_eof
and commit_string_eof
actions to be called from one of the generated/synthetic states terminating the matching state.
Here is a working state machine. Note that the far right hand node exit action has exactly one action: commit_nonstring_eof
.
Here is a broken state machine. Note that the far right hand node exit action calls both
commit_string_eof
and commit_nonstring_eof
action commit_string { }
action commit_string_eof { }
action commit_nonstring_eof { }
action set_mark { }
action reset {
/* Force the machine back into state 1. This happens after
* an incomplete match when some graphical characters are
* consumed, but not enough for use to keep the string. */
fgoto start;
}
# Matching classes union to 0x00 .. 0xFF
graphic = (0x09 | 0x20 .. 0x7E);
non_graphic = (0x00 .. 0x08 | 0x0A .. 0x1F | 0x7F .. 0xFF);
collector = (
start: (
# Set the mark if we have a graphic character,
# otherwise go to non_graphic state and consume input
graphic @set_mark -> has_glyph |
non_graphic -> no_glyph
) $eof(commit_nonstring_eof),
no_glyph: (
# Consume input until a graphic character is encountered
non_graphic -> no_glyph |
graphic @set_mark -> has_glyph
) $eof(commit_nonstring_eof),
has_glyph: (
# We already matched one graphic character to get here
# from start or no_glyph. Try to match N-1 before allowing
# the string to be committed. If we don't get to N-1,
# drop back to the start state
graphic{3} $lerr(reset) -> has_string
) @eof(commit_nonstring_eof),
has_string: (
# Already consumed our quota of N graphic characters;
# consume input until we run out of graphic characters
# then reset the machine. All exiting edges should commit
# the string. We differentiate between exiting on a non-graphic
# input that shouldn't be added to the string and exiting
# on a (graphic) EOF that should be added.
graphic* non_graphic -> start
) %from(commit_string) @eof(commit_string_eof) // okay
#) %from(commit_string) %eof(commit_string_eof) // bad
); #$debug;
main := (collector)+;
I think that for the concatenated machines a.b
the final state occurs for a
when there is a transition from a
to the first character of b
. (cf. "Regular Language Operators / Concatenation" in the manual).
Despite the fact that the dot output shows no final states
Dot output shows a lot of final states. Transition from 7 to 6 makes 7 final, for example. Transition from 6 to 1 makes 6 final, and so on.