Search code examples
f#fparsec

How to add a condition that a parsed number must satisfy in FParsec?


I am trying to parse an int32 with FParsec but have an additional restriction that the number must be less than some maximum value. Is their a way to perform this without writing my own custom parser (as below) and/or is my custom parser (below) the appropriate way of achieving the requirements.

I ask because most of the built-in library functions seem to revolve around a char satisfying certain predicates and not any other type.

let pRow: Parser<int> = 
   let error = messageError ("int parsed larger than maxRows")
   let mutable res = Reply(Error, error)
   fun stream ->
      let reply = pint32 stream
      if reply.Status = Ok && reply.Result <= 1000000 then
         res <- reply
      res

UPDATE

Below is an attempt at a more fitting FParsec solution based on the direction given in the comment below:

let pRow2: Parser<int> = 
   pint32 >>= (fun x -> if x <= 1048576 then (preturn x) else fail "int parsed larger than maxRows")

Is this the correct way to do it?


Solution

  • You've done an excellent research and almost answered your own question.

    Generally, there are two approaches:

    1. Unconditionally parse out an int and let the further code to check it for validity;
    2. Use a guard rule bound to the parser. In this case (>>=) is the right tool;

    In order to make a good choice, ask yourself whether an integer that failed to pass the guard rule has to "give another chance" by triggering another parser?

    Here's what I mean. Usually, in real-life projects, parsers are combined in some chains. If one parser fails, the following one is attempted. For example, in this question, some programming language is parsed, so it needs something like:

    let pContent =
        pLineComment <|> pOperator <|> pNumeral <|> pKeyword <|> pIdentifier
    

    Theoretically, your DSL may need to differentiate a "small int value" from another type:

    /// The resulting type, or DSL
    type Output =
        | SmallValue of int
        | LargeValueAndString of int * string
        | Comment of string
    
    let pSmallValue =
        pint32 >>= (fun x -> if x <= 1048576 then (preturn x) else fail "int parsed larger than maxRows")
        |>> SmallValue
    let pLargeValueAndString =
        pint32 .>> ws .>>. (manyTill ws)
        |>> LargeValueAndString
    let pComment =
        manyTill ws
        |>> Comment
    
    let pCombined =
        [ pSmallValue; pLargeValueAndString; pComment]
        |> List.map attempt // each parser is optional
        |> choice // on each iteration, one of the parsers must succeed
        |> many // a loop
    

    Built this way, pCombined will return:

    • "42 ABC" gets parsed as [ SmallValue 42 ; Comment "ABC" ]
    • "1234567 ABC" gets parsed as [ LargeValueAndString(1234567, "ABC") ]

    As we see, the guard rule impacts how the parsers are applied, so the guard rule has to be within the parsing process.

    If, however, you don't need such complication (e.g., an int is parsed unconditionally), your first snippet is just fine.