Search code examples
f#fparsec

Parsing numbers in FParsec


I've started learning FParsec. It has a very flexible way to parse numbers; I can provide a set of number formats I want to use:

type Number =
    | Numeral of int
    | Decimal of float
    | Hexadecimal of int
    | Binary of int

let numberFormat = NumberLiteralOptions.AllowFraction
                   ||| NumberLiteralOptions.AllowHexadecimal
                   ||| NumberLiteralOptions.AllowBinary

let pnumber = 
    numberLiteral numberFormat "number"
    |>> fun num -> if num.IsHexadecimal then Hexadecimal (int num.String)
                   elif num.IsBinary then Binary (int num.String)
                   elif num.IsInteger then Numeral (int num.String)
                   else Decimal (float num.String)

However, the language I'm trying to parse is a bit strange. A number could be numeral (non-negative int), decimal (non-negative float), hexadecimal (with prefix #x) or binary (with prefix #b):

numeral: 0, 2
decimal: 0.2, 2.0
hexadecimal: #xA04, #x611ff
binary: #b100, #b001

Right now I have to do parsing twice by substituting # by 0 (if necessary) to make use of pnumber:

let number: Parser<_, unit> =  
    let isDotOrDigit c = isDigit c || c = '.'
    let numOrDec = many1Satisfy2 isDigit isDotOrDigit 
    let hexOrBin = skipChar '#' >>. manyChars (letter <|> digit) |>> sprintf "0%s"
    let str = spaces >>. numOrDec <|> hexOrBin
    str |>> fun s -> match run pnumber s with
                     | Success(result, _, _)   -> result
                     | Failure(errorMsg, _, _) -> failwith errorMsg

What is a better way of parsing in this case? Or how can I alter FParsec's CharStream to be able to make conditional parsing easier?


Solution

  • Parsing numbers can be pretty messy if you want to generate good error messages and properly check for overflows.

    The following is a simple FParsec implementation of your number parser:

    let numeralOrDecimal : Parser<_, unit> =
        // note: doesn't parse a float exponent suffix
        numberLiteral NumberLiteralOptions.AllowFraction "number" 
        |>> fun num -> 
                // raises an exception on overflow
                if num.IsInteger then Numeral(int num.String)
                else Decimal(float num.String)
    
    let hexNumber =    
        pstring "#x" >>. many1SatisfyL isHex "hex digit"
        |>> fun hexStr -> 
                // raises an exception on overflow
                Hexadecimal(System.Convert.ToInt32(hexStr, 16)) 
    
    let binaryNumber =    
        pstring "#b" >>. many1SatisfyL (fun c -> c = '0' || c = '1') "binary digit"
        |>> fun hexStr -> 
                // raises an exception on overflow
                Binary(System.Convert.ToInt32(hexStr, 2))
    
    
    let number =
        choiceL [numeralOrDecimal
                 hexNumber
                 binaryNumber]
                "number literal"
    

    Generating good error messages on overflows would complicate this implementation a bit, as you would ideally also need to backtrack after the error, so that the error position ends up at the start of the number literal (see the numberLiteral docs for an example).

    A simple way to gracefully handle possible overflow exception is to use a little exception handling combinator like the following:

    let mayThrow (p: Parser<'t,'u>) : Parser<'t,'u> =
        fun stream ->
            let state = stream.State        
            try 
                p stream
            with e -> // catching all exceptions is somewhat dangerous
                stream.BacktrackTo(state)
                Reply(FatalError, messageError e.Message)
    

    You could then write

    let number = mayThrow (choiceL [...] "number literal")
    

    I'm not sure what you meant to say with "alter FParsec's CharStream to be able to make conditional parsing easier", but the following sample demonstrates how you could write a low-level implementation that only uses the CharStream methods directly.

    type NumberStyles = System.Globalization.NumberStyles
    let invariantCulture = System.Globalization.CultureInfo.InvariantCulture
    
    let number: Parser<Number, unit> =
      let expectedNumber = expected "number"
      let inline isBinary c = c = '0' || c = '1'
      let inline hex2int c = (int c &&& 15) + (int c >>> 6)*9
    
      let hexStringToInt (str: string) = // does no argument or overflow checking        
          let mutable n = 0
          for c in str do
              n <- n*16 + hex2int c
          n    
    
      let binStringToInt (str: string) = // does no argument or overflow checking
          let mutable n = 0
          for c in str do
              n <- n*2 + (int c - int '0')
          n
    
      let findIndexOfFirstNonNull (str: string) =
          let mutable i = 0
          while i < str.Length && str.[i] = '0' do
              i <- i + 1
          i
    
      let isHexFun = id isHex // tricks the compiler into caching the function object
      let isDigitFun = id isDigit
      let isBinaryFun = id isBinary
    
      fun stream ->
        let start = stream.IndexToken
        let cs = stream.Peek2()        
        match cs.Char0, cs.Char1 with
        | '#', 'x' ->
            stream.Skip(2)
            let str = stream.ReadCharsOrNewlinesWhile(isHexFun, false)
            if str.Length <> 0 then
                let i = findIndexOfFirstNonNull str
                let length = str.Length - i
                if length < 8 || (length = 8 && str.[i] <= '7') then
                    Reply(Hexadecimal(hexStringToInt str))
                else
                    stream.Seek(start)
                    Reply(Error, messageError "hex number literal is too large for 32-bit int")
            else 
                Reply(Error, expected "hex digit")
    
        | '#', 'b' ->
            stream.Skip(2)
            let str = stream.ReadCharsOrNewlinesWhile(isBinaryFun, false)
            if str.Length <> 0 then
                let i = findIndexOfFirstNonNull str
                let length = str.Length - i
                if length < 32 then 
                    Reply(Binary(binStringToInt str))
                else
                    stream.Seek(start)
                    Reply(Error, messageError "binary number literal is too large for 32-bit int")
            else 
                Reply(Error, expected "binary digit")
    
        | c, _ ->
            if not (isDigit c) then Reply(Error, expectedNumber)
            else
                stream.SkipCharsOrNewlinesWhile(isDigitFun) |> ignore
                if stream.Skip('.') then
                    let n2 = stream.SkipCharsOrNewlinesWhile(isDigitFun)
                    if n2 <> 0 then
                        // we don't parse any exponent, as in the other example
                        let mutable result = 0.
                        if System.Double.TryParse(stream.ReadFrom(start), 
                                                  NumberStyles.AllowDecimalPoint,
                                                  invariantCulture, 
                                                  &result)
                        then Reply(Decimal(result))
                        else 
                            stream.Seek(start)
                            Reply(Error, messageError "decimal literal is larger than System.Double.MaxValue")                    
                    else 
                        Reply(Error, expected "digit")
                else
                   let decimalString = stream.ReadFrom(start)
                   let mutable result = 0
                   if System.Int32.TryParse(stream.ReadFrom(start),
                                            NumberStyles.None,
                                            invariantCulture,
                                            &result)
                   then Reply(Numeral(result))
                   else 
                       stream.Seek(start)
                       Reply(Error, messageError "decimal number literal is too large for 32-bit int")
    

    While this implementation parses hex and binary numbers without the help of system methods, it eventually delegates the parsing of decimal numbers to the Int32.TryParse and Double.TryParse methods.

    As I said: it's messy.