Search code examples
f#fparsec

FParsec ‘many’ primitive fails when stream.UserState updated


The following routine is a minor and simplified change of the official documentation, "Tracing a parser"“ parser tracing wrapper.

let (<!>) (parser: Parser<_,USER_STATE>) label : Parser<_,USER_STATE> =
    fun stream ->
        do stream.UserState <- stream.UserState  // <= works if commented out!
        let reply = parser stream
        reply

This wrapper allows inspection of reply as parsers execute and the updating of stream.UserState as needed.

NOTE: this code just copies stream.UserState onto itself, effectively doing nothing, because that is the minimum operation for the following error. The official documentation manipulating stream.UserState ("Recursive grammers with nesting restrictions”) manipulates stream.UserState much more...

Commenting out do stream.UserState <- stream.UserState at line 3 allows both repeating (list-producing) and non-repeating FParsec primitives to succeed. For these list-producing primitives the last subordinate parser failure is unwound and the application of, for example, many succeeds.

If parser in the above wrapper is NOT an FParsec primitive that creates a list of results (like many or sepEndBy), from the repeated application of subordinate parsers then this code parses successfully.

If parser IS an FParsec repeating primitive (e.g. many or sepEndBy), then the failure of the subordinate parser application is passed back out as the failure also of the repeating FParsec primitive - an unexpected failure.

Why does including do stream.UserState <- stream.UserState cause FParsec primitives like many to fail?

EDIT1: Please note that the FParsec documentation does an assignment to stream.UserState as done in this question. @brianbern, I don't understand from your post how what I am doing is wrong given the documentation. Thanks!

How does one assign to stream.UserState without breaking calls to the FParsec many primitive?


Solution

  • The reason this happens is that the setter for CharStream.UserState increments the stream's StateTag as a side-effect. From the FParsec source code:

    public TUserState UserState {
        get { return _UserState; }
        set { _UserState = value; ++StateTag; }
    }
    

    So when you assign the stream's UserState to itself, it's not the same as doing nothing.

    UPDATE: The documentation for the “many” parser states:

    The parser many p repeatedly applies the parser p until p fails. It returns a list of the results returned by p. At the end of the sequence p must fail without changing the parser state and without signalling a FatalError, otherwise many p will fail with the error reported by p.

    The example in the FParsec documentation doesn't modify the parser state at the end of the sequence, but your example parser modifies the parser state every time it is called.