parsing haskell parsec parser-combinators

How to preserve factorial computing when parsing string with Parsec in Haskell?

I ran into difficulties trying resolve of factorial output calculation with Text.Parsec in Haskell. Let's have a look at some code I use so far:

import Text.Parsec hiding(digit)
import Data.Functor

type CalcParser a = CalcParsec String () a

digit :: CalcParser Char
digit = oneOf ['0'..'9']

number :: CalcParser Double
number = try calcDouble <|> calcInt  

calcInt :: CalcParser Double 
calcInt = read <$> many1 digit

calcDouble :: CalcParser Double 
calcDouble = do
    integ <- many1 digit
    char '.'
    decim <- many1 digit
    pure $ read $ integ <> "." <> decim

numberFact :: CalcParser Integer     
numberFact = read <$> many1 digit

applyMany :: a -> [a -> a] -> a
applyMany x [] = x
applyMany x (h:t) = applyMany (h x) t

div_ :: CalcParser (Double -> Double -> Double)
div_= do
    char '/'
    return (/)

star :: CalcParser (Double -> Double -> Double)
star = do
    char '*'    
    return (*)  

plus :: CalcParser (Double -> Double -> Double)
plus = do
    char '+'    
    return (+)

minus :: CalcParser (Double -> Double -> Double)
minus = do
    char '-'    
    return (-)  

multiplication :: CalcParser Double
multiplication = do
    spaces
    lhv <- atom <|> fact' <|> negation'
    spaces
    t <- many tail
    return $ applyMany lhv t
    where tail =
                do
                    f <- star <|> div_
                    spaces
                    rhv <- atom <|> fact' <|> negation'
                    spaces
                    return (`f` rhv)

atom :: CalcParser Double
atom = number <|> do
    char '('
    res <- addition
    char ')'
    return res

fact' :: CalcParser Double
fact' = do
    spaces
    rhv <- numberFact
    char '!'
    return $ factorial rhv

factorial :: Integer -> Double
factorial n
    | n < 0 = error "No factorial exists for negative inputs" 
    | n == 0 || n == 1 = 1
    | otherwise = acc n 1
    where
    acc 0 a = a
    acc b a = acc (b-1) (fromIntegral b * a) 

negation' :: CalcParser Double
negation' = do
    spaces
    char '~'
    rhv <- atom
    spaces
    return $ negate rhv         

addition :: CalcParser Double
addition = do
    spaces
    lhv <- multiplication <|> fact' <|> negation'
    spaces
    t <- many tail
    return $ applyMany lhv t
    where tail =
                do
                    f <- plus <|> minus 
                    spaces
                    rhv <- multiplication <|> fact' <|> negation'
                    spaces
                    return (`f` rhv)

It's a kind of a simple calculator parser that provides with addition / subtraction / factorial processing. I gonna let input any factorial-linked components of string chunk so that each of them to consist of some number followed by '!' character (as normally in most real world calculators). However, when running a parser test as:

parseTest addition "3!"

it fails to compute factorial itself, but returns 3.0 (output to be presented as a floating point number, which is why I use CalcParser Double in this program). A bizarr fact is that whenever I set '!' in prior to a number:

fact' :: CalcParser Double
fact' = do
    spaces
    char '!'
    rhv <- numberFact
    return $ factorial rhv

the result of:

parseTest addition "!3"

meets my expectations, i.e. it equals 6.0. Following that, I assume there is some setback in the first version of stuff which fails to run factorial computing whenever '!' is a next to some number-like input elements. That's just a case I'm looking for any help to solve here. So, what's wrong with right-side '!' and how would you handle the problem decribed?

Solution

The general problem is that Parsec stops trying alternatives when it finds the first one that works. If you write:

parseTest (atom <|> fact') "3!"

this will result in 3.0. This is because the atom parser successfully parses the initial part of the string "3", so Parsec doesn't even try the fact' parser. It leaves the rest of the string "!" for another, later parser to handle.

In your parser code above, if you try the parse:

parseTest addition "3!"

the addition parser starts by trying the multiplication parser. The multiplication parser has the line:

lhv <- atom <|> fact' <|> negation'

so it starts by trying the atom parser. This parser works fine and returns 3.0, so it never bothers trying fact' or negation.

To fix your parser, you need to make sure that you don't successfuly parse an atom in an alternative before trying fact'. You could start by switching the order:

> parseTest (fact' <|> atom) "3!"
6.0

This works fine to parse "3!" (and gives 6.0), but it fails if you try to parse something else:

> parseTest (fact' <|> atom) "3"
parse error at (line 1, column 2):
unexpected end of input
expecting "!"

This is because, for efficiency reasons, Parsec doesn't automatically "backtrack". If you try to parse something and it "consumes input" before failing, it fails completely instead of trying more alternatives. Here, fact' starts by calling numberFact which successfully "consumes" the "3", and then it tries char '!', which fails. So, fact "fails after consuming input" which leads to an immediate parse error.

You can override this behavior using the try function:

> parseTest (try fact' <|> atom) "3"
3.0
> parseTest (try fact' <|> atom) "3!"
6.0

Here try fact' applies the parser fact' but treats "fail after consuming input" as if it was "fail after consuming no input", so additional alternatives can be checked.

If you applied this change to both places in your multiplication parser:

multiplication :: CalcParser Double
multiplication = do
    spaces
    lhv <- try fact' <|> atom <|> negation'
    spaces
    t <- many tail
    return $ applyMany lhv t
    where tail =
                do
                    f <- star <|> div_
                    spaces
                    rhv <- try fact' <|> atom <|> negation'
                    spaces
                    return (`f` rhv)

and made similar changes to your addition parser:

addition :: CalcParser Double
addition = do
    spaces
    lhv <- try fact' <|> multiplication <|> negation'
    spaces
    t <- many tail
    return $ applyMany lhv t
    where tail =
                do
                    f <- plus <|> minus
                    spaces
                    rhv <- try fact' <|> multiplication <|> negation'
                    spaces
                    return (`f` rhv)

then it would work better:

> parseTest addition "3!"
6.0
> parseTest addition "3"
3.0
> parseTest addition "3+2*6!"
1443.0

It's also a good idea to add an eof parser to your test to make sure that you don't have any left over trailing garbage that wasn't parsed:

> parseTest addition "3  Hey, I could write anything here"
3.0
> parseTest (addition <* eof) "3  but adding eof will generate an error"
parse error at (line 1, column 4):
unexpected 'b'
expecting space, "*", "/", white space, "+", "-" or end of input