Search code examples
f#fparsec

Parsing int or float with FParsec


I'm trying to parse a file, using FParsec, which consists of either float or int values. I'm facing two problems that I can't find a good solution for.

1

Both pint32 and pfloat will successfully parse the same string, but give different answers, e.g pint32 will return 3 when parsing the string "3.0" and pfloat will return 3.0 when parsing the same string. Is it possible to try parsing a floating point value using pint32 and have it fail if the string is "3.0"?

In other words, is there a way to make the following code work:

let parseFloatOrInt lines =
    let rec loop intvalues floatvalues lines =
        match lines with
        | [] -> floatvalues, intvalues
        | line::rest ->
            match run floatWs line with
            | Success (r, _, _) -> loop intvalues (r::floatvalues) rest
            | Failure _ -> 
                match run intWs line with
                | Success (r, _, _) -> loop (r::intvalues) floatvalues rest
                | Failure _ -> loop intvalues floatvalues rest

    loop [] [] lines

This piece of code will correctly place all floating point values in the floatvalues list, but because pfloat returns "3.0" when parsing the string "3", all integer values will also be placed in the floatvalues list.

2

The above code example seems a bit clumsy to me, so I'm guessing there must be a better way to do it. I considered combining them using choice, however both parsers must return the same type for that to work. I guess I could make a discriminated union with one option for float and one for int and convert the output from pint32 and pfloat using the |>> operator. However, I'm wondering if there is a better solution?


Solution

  • You're on the right path thinking about defining domain data and separating definition of parsers and their usage on source data. This seems to be a good approach, because as your real-life project grows further, you would probably need more data types.

    Here's how I would write it:

    /// The resulting type, or DSL
    type MyData =
        | IntValue of int
        | FloatValue of float
        | Error  // special case for all parse failures
    
    // Then, let's define individual parsers:
    let pMyInt =
        pint32
        |>> IntValue
    
    // this is an alternative version of float parser.
    // it ensures that the value has non-zero fractional part.
    // caveat: the naive approach would treat values like 42.0 as integer
    let pMyFloat =
        pfloat
        >>= (fun x -> if x % 1 = 0 then fail "Not a float" else preturn (FloatValue x))
    let pError =
        // this parser must consume some input,
        // otherwise combined with `many` it would hang in a dead loop
        skipAnyChar
        >>. preturn Error
    
     // Now, the combined parser:
    let pCombined =
        [ pMyFloat; pMyInt; pError ]    // note, future parsers will be added here;
                                        // mind the order as float supersedes the int,
                                        // and Error must be the last
        |> List.map (fun p -> p .>> ws) // I'm too lazy to add whitespase skipping
                                        // into each individual parser
        |> List.map attempt             // each parser is optional
        |> choice                       // on each iteration, one of the parsers must succeed
        |> many                         // a loop
    

    Note, the code above is capable working with any sources: strings, streams, or whatever. Your real app may need to work with files, but unit testing can be simplified by using just string list.

    // Now, applying the parser somewhere in the code:
    let maybeParseResult =
        match run pCombined myStringData with
        | Success(result, _, _) -> Some result
        | Failure(_, _, _)      -> None // or anything that indicates general parse failure
    

    UPD. I have edited the code according to comments. pMyFloat was updated to ensure that the parsed value has non-zero fractional part.