Search code examples
parsingf#fparsec

FParsec and postfix modifiers to parsed items


As an exercise for myself I'm using FParsec to write a function that can generate a random string from a spec in (limited) Regex form.

E.g.

Input: ([Hh]ello ){1,3} world!?
Output: Hello hello world!

Input: (\d{1,2}\.){3}(\d{1,2})
Output: 38.12.29.05

I have a lot of it working, but I'm having trouble with the idea of postfix terms (i.e. that a parser might need to go back and modify the output, not just the position in the Char stream). E.g. "a" vs "a+"

Here's a cut-down version of my domain types:

type Count =
    | ExactCount of int
    | MinCount of int
    | MaxCount of int
    | RangeCount of int * int

type Term =
    | CharLiteral of char
    | Count of Term * Count

type RandExpSpec = Term list

So the input ab should generate [CharLiteral 'a'; CharLiteral 'b'] but ab+ should generate [CharLiteral 'a'; Count (CharLiteral 'b', MinCount 1)]. So that kind of implies that, on encountering a Count term in the stream, the parser needs to backtrack through the output in order to wrap the last term in another object.

Now, I don't know how to do that. Here's my current parsing definition, that does (mostly) work but is very inefficient:

let parseCharLiteral = choice [ letter; digit ] |>> CharLiteral

let rec parseTerm =
    parse.Delay(fun () -> choice [ parseCharLiteral ])

and parseCount =
    parseTerm
    .>>. choice [ skipChar '*' >>% (MinCount 0)
                  skipChar '+' >>% (MinCount 1)
                  skipChar '?' >>% (RangeCount(0, 1)) ]
    |>> Count

let parseTerms =
    many ((attempt parseCount) <|> parseTerm) .>> eof

You can see that in parseCount I call parseTerm first, then parse the actual count information. Then, in parseTerms I attempt the parseCount parser every time, backtracking through the input if it doesn't work. This is very inefficient because I'm essentially making two passes on almost every char in the input stream just in case it's followed by a count modifier.

Is there a more efficient way to do this? I feel like I should be writing something more like:

let parseCharLiteral = choice [ letter; digit ] |>> CharLiteral

let rec parseTerm =
    parse.Delay(fun () -> choice [ parseCharLiteral ] .>>. (attempt parseCount))

and parseCount =
    choice [ skipChar '*' >>% (MinCount 0)
             skipChar '+' >>% (MinCount 1)
             skipChar '?' >>% (RangeCount(0, 1)) ]
    |>> Count

let parseTerms =
    many parseTerm .>> eof

but I can't make that work since parseCount qould need to wrap the previous term returned by parseTerm.


Solution

  • I think you can use opt to allow parseCount to not find a count if there isn't one:

    let parseCount =
        parseTerm
        .>>. opt (choice [ skipChar '*' >>% (MinCount 0)
                           skipChar '+' >>% (MinCount 1)
                           skipChar '?' >>% (RangeCount(0, 1)) ])
        |>> function
        | term, None -> term
        | term, Some count -> Count (term, count)
    
    let parseTerms =
        many parseCount .>> eof