Search code examples
state-machineragel

Ragel, final states, and EOF


I do not understand what Ragel considers a "final" state. IIRC the User's Guide says that states that are final before machine simplification remain final thereafter. When exactly is a state final, and how does one recognize this?


Application:

I'm using the state machine syntax to implement a string finder -- find ASCII strings with length greater than n, and print them. This means implementing a maximum length matcher, as below.

Despite the fact that the dot output shows no final states, the EOF transitions behave differently depending on which flavor of {$%@}eof is used. I do not understand why this should be. For example, in the has_string state below, using %eof instead of @eof causes both the commit_nonstring_eof and commit_string_eof actions to be called from one of the generated/synthetic states terminating the matching state.

Here is a working state machine. Note that the far right hand node exit action has exactly one action: commit_nonstring_eof. Working state machine

Here is a broken state machine. Note that the far right hand node exit action calls both commit_string_eof and commit_nonstring_eof Bad state machine

action commit_string { }
action commit_string_eof { }
action commit_nonstring_eof { }
action set_mark { }

action reset {
   /* Force the machine back into state 1. This happens after
    * an incomplete match when some graphical characters are
    * consumed, but not enough for use to keep the string. */
    fgoto start;
 }

 # Matching classes union to 0x00 .. 0xFF
 graphic = (0x09 | 0x20 .. 0x7E);
 non_graphic =  (0x00 .. 0x08 | 0x0A .. 0x1F | 0x7F .. 0xFF);

 collector = (

 start: (
     # Set the mark if we have a graphic character,
     # otherwise go to non_graphic state and consume input
     graphic @set_mark -> has_glyph |
     non_graphic -> no_glyph
 ) $eof(commit_nonstring_eof),

 no_glyph: (
     # Consume input until a graphic character is encountered
     non_graphic -> no_glyph |
     graphic @set_mark -> has_glyph
 ) $eof(commit_nonstring_eof),

 has_glyph: (
      # We already matched one graphic character to get here
      # from start or no_glyph. Try to match N-1 before allowing
      # the string to be committed. If we don't get to N-1,
      # drop back to the start state
      graphic{3} $lerr(reset) -> has_string
  ) @eof(commit_nonstring_eof),

  has_string: (
      # Already consumed our quota of N graphic characters;
      # consume input until we run out of graphic characters
      # then reset the machine. All exiting edges should commit
      # the string. We differentiate between exiting on a non-graphic
      # input that shouldn't be added to the string and exiting
      # on a (graphic) EOF that should be added.
      graphic* non_graphic -> start
  ) %from(commit_string) @eof(commit_string_eof) // okay
 #) %from(commit_string) %eof(commit_string_eof) // bad

); #$debug;

main := (collector)+;

Solution

  • I think that for the concatenated machines a.b the final state occurs for a when there is a transition from a to the first character of b. (cf. "Regular Language Operators / Concatenation" in the manual).

    Despite the fact that the dot output shows no final states

    Dot output shows a lot of final states. Transition from 7 to 6 makes 7 final, for example. Transition from 6 to 1 makes 6 final, and so on.