Search code examples
haskellparsecmegaparsec

How to parse Number with comma via Megaparsec


Currently I have a parser:

pScientific :: Parser Scientific
pScientific = lexeme L.scientific

This is able to easily parse something like 4087.00

but fails when then number 4,087.00 Is there a way to make megaparsec parse number with comma?

PS: I am very new to haskell, so apologize if this is a stupid question


Solution

  • The reason this is not parsed is because the scientific type is mainly defined for JSON parsing, and JSON does not allow this, and a comma is used to separate elements in arrays and objects.

    We can take a look at the implementation of scientific [src]:

    -- | Parse a JSON number.
    scientific :: Parser Scientific
    scientific = do
      sign <- A.peekWord8'
      let !positive = not (sign == W8_MINUS)
      when (sign == W8_PLUS || sign == W8_MINUS) $
        void A.anyWord8
    
      n <- decimal0
    
      let f fracDigits = SP (B.foldl' step n fracDigits)
                            (negate $ B.length fracDigits)
          step a w = a * 10 + fromIntegral (w - W8_0)
    
      dotty <- A.peekWord8
      SP c e <- case dotty of
                  Just W8_DOT -> A.anyWord8 *> (f <$> A.takeWhile1 isDigit_w8)
                  _           -> pure (SP n 0)
    
      let !signedCoeff | positive  =  c
                       | otherwise = -c
    
      (A.satisfy (\ex -> case ex of W8_e -> True; W8_E -> True; _ -> False) *>
          fmap (Sci.scientific signedCoeff . (e +)) (signed decimal)) <|>
        return (Sci.scientific signedCoeff    e)
    {-# INLINE scientific #-}
    

    The main thing to change is the decimal0 part, that captures a sequence of zero or more decimal numbers. We can for example implement this with:

    import qualified Data.ByteString as B
    
    decimal0' :: Parser Integer
    decimal0' = do
      digits <- B.filter (\x -> x /= 44) <$> A.takeWhile1 (\x -> isDigit_w8 x || x == 44)
      if B.length digits > 1 && B.unsafeHead digits == 48
        then fail "leading zero"
        else return (bsToInteger digits)
    

    and then use that one with:

    import qualified Data.Attoparsec.ByteString as A
    import qualified Data.Scientific as Sci
    import Data.Attoparsec.ByteString.Char8 (isDigit_w8)
    
    -- | Parse a JSON number.
    scientific :: Parser Scientific
    scientific = do
      sign <- A.peekWord8'
      let !positive = not (sign == 45)
      when (sign == 43 || sign == 45) $
        void A.anyWord8
    
      n <- decimal0'
    
      let f fracDigits = SP (B.foldl' step n fracDigits)
                            (negate $ B.length fracDigits)
          step a w = a * 10 + fromIntegral (w - W8_0)
    
      dotty <- A.peekWord8
      SP c e <- case dotty of
                  Just 46 -> A.anyWord8 *> (f <$> A.takeWhile1 isDigit_w8)
                  _           -> pure (SP n 0)
    
      let !signedCoeff | positive  =  c
                       | otherwise = -c
    
      (A.satisfy (\ex -> case ex of W8_e -> True; W8_E -> True; _ -> False) *>
          fmap (Sci.scientific signedCoeff . (e +)) (signed decimal)) <|>
        return (Sci.scientific signedCoeff    e)
    {-# INLINE scientific' #-}
    

    This does not take into account that the comma is placed after every three digits, so that will require extra logic, but this is a basic implementation to work accept commas in the integral part of the Scientific.