Search code examples
parsingf#fparsec

How can I parse homogeneous lists in FParsec?


I'm having an issue trying to parse a homogeneous json-like array in FParsec. I've decomposed the problem to a short example that reproduces it.

#r @"..\packages\FParsec.1.0.2\lib\net40-client\FParsecCS.dll"
#r @"..\packages\FParsec.1.0.2\lib\net40-client\FParsec.dll"

open System
open FParsec

let test p str =
        match run p str with
        | Success(result, _, _)   -> printfn "Success: %A" result
        | Failure(errormsg, _, _) -> printfn "Failure: %s" errormsg


type CValue = CInt of int64
            | CBool of bool
            | CList of CValue list

let P_WHITESPACE = spaces
let P_COMMA = pstring ","
let P_L_SBRACE = pstring "[" .>> P_WHITESPACE
let P_R_SBRACE = P_WHITESPACE >>. pstring "]"

let P_INT_VALUE = pint64 |>> CInt

let P_TRUE = stringReturn "true" (CBool true)
let P_FALSE = stringReturn "false" (CBool false)
let P_BOOL_VALUE = P_TRUE <|> P_FALSE


let P_LIST_VALUE =
    let commaDelimitedList ptype = sepBy (ptype .>> P_WHITESPACE) (P_COMMA .>> P_WHITESPACE)
    let delimitedList = (commaDelimitedList P_INT_VALUE) <|> (commaDelimitedList P_BOOL_VALUE)
    let enclosedList = between P_L_SBRACE P_R_SBRACE delimitedList
    enclosedList |>> CList

When I use the test function to try it out, I get the following results:

test P_LIST_VALUE "[1,2,3]"
Success: CList [CInt 1L; CInt 2L; CInt 3L]

test P_LIST_VALUE "[true,false]"
Failure: Error in Ln: 1 Col: 2
[true,false]
 ^
Expecting: integer number (64-bit, signed) or ']'

If I swap the order of P_INT_VALUE and P_BOOL_VALUE when using the <|> operator, then [true,false] parses successfully but [1,2,3] fails with a similar error. So basically, what ever parser I use first is what it tries to use.

I understand the <|> operator won't attempt the RHS parser if the LHS mutates the user state - but I can't see how that could be happening. P_BOOL_VALUE and P_INT_VALUE don't have any starting characters in common, so both should be failing immediately when trying to parse the wrong data type. Ints never start with 'false' or 'true' and bools never start with numeric digits.

What am I doing wrong?


Solution

  • Ah, I've figured it out. The hint in the error message is the or ']'. The problem is that sepBy succeeds on empty input, so when it hits the t, it returns successfully with an empty list, and then control passes back to between which tries and fails to find a terminating ].

    The solution is to move the empty list case out of the int/bool-specific parsers, like this:

    let P_LIST_VALUE =
        let commaDelimitedList ptype = sepBy1 (ptype .>> P_WHITESPACE) (P_COMMA .>> P_WHITESPACE)
        let delimitedList = (commaDelimitedList P_INT_VALUE) <|> (commaDelimitedList P_BOOL_VALUE) <|> preturn []
        let enclosedList = between P_L_SBRACE P_R_SBRACE delimitedList
        enclosedList |>> CList
    

    Note the use of sepBy1 instead of sepBy, and the addition of <|> preturn [] to handle the empty case only once in delimitedList.

    As a side-note, I don't know your exact application, but it is generally not such a good idea to enforce typing in the parser; a more common way to implement this would be to just parse a commaDelimitedList (P_INT_VALUE <|> P_BOOL_VALUE) (with your original commaDelimitedList) and then check the typing in a subsequent analysis phase.