Suppose I have two parsers p
and q
and I concatenate them like this:
r = try p *> q
In Parsec, the behavior of this is:
p
fails without consuming input, then r
fails without consuming input.p
fails after consuming input, then r
fails without consuming input.q
fails without consuming input, then r
fails after consuming p
.q
fails after consuming input, then r
fails after consuming p
and parts of q
.However, the behavior I'm looking for is a bit unusual:
p
fails without consuming input, then r
should fail without consuming input.p
fails after consuming input, then r
should fail without consuming input.q
fails without consuming input, then r
should fail without consuming input.q
fails after consuming input, then r
should fail after consuming some input.I can't seem to think of a clean way to do this.
The reason is that I have a parser like this:
s = (:) <$> q <*> many r
The q
parser, embedded inside the r
parser, needs a way to signal either: invalid input (which occurs when q
consumes input but fails), or end of the many
loop (which occurs when q
doesn't consume anything and fails). If the input is invalid, it should just fail the parser entirely and report the problem to the user. If there is no more input to consume, then it should end the many
loop (without reporting a parser error to the user). The trouble is that it's possible that the input ends with a p
but without any more valid q
's to consume, in which case q
will fail but without consuming any input.
So I was wondering if anyone have an elegant way to solve this problem? Thanks.
p = string "P"
q = (++) <$> try (string "xy") <*> string "z"
Test input on (hypothetical) parser s
, had it worked the way I wanted:
xyz
(accept)xyzP
(accept; P
remains unparsed)xyzPx
(accept; Px
remains unparsed; q
failed but did not consume any input)xyzPxy
(reject; parser q
consumed xy
but failed)xyzPxyz
(accept)In the form r = try p *> q
, s
will actually reject both case #2 and case #3 above. Of course, it's possible to achieve the above behavior by writing:
r = (++) <$> try (string "P" *> string "xy") <*> string "z"
but this isn't a general solution that works for any parsers p
and q
. (Perhaps a general solution doesn't exist?)
I believe I found a solution. It's not especially nice, but seems to works. At least something to start with:
{-# LANGUAGE FlexibleContexts #-}
import Control.Applicative hiding (many, (<|>))
import Control.Monad (void)
import Control.Monad.Trans (lift)
import Control.Monad.Trans.Maybe
import Text.Parsec hiding (optional)
import Text.Parsec.Char
import Text.Parsec.String
rcomb :: (Stream s m t) => ParsecT s u m a -> ParsecT s u m b -> ParsecT s u m b
rcomb p q = ((test $ opt p *> opt q) <|> pure (Just ()))
>>= maybe empty (\_ -> p *> q)
where
-- | Converts failure to @MaybeT Nothing@:
opt = MaybeT . optional -- optional from Control.Applicative!
-- | Tests running a parser, returns Nothing if parsers failed consuming no
-- input, Just () otherwise.
test = lookAhead . try . runMaybeT . void
This is the r
combinator you're asking for. The idea is that we first execute the parsers in a "test" run (using lookAhead . try
) and if any of them fails without consuming input, we record it as Nothing
inside MaybeT
. This is accomplished by opt
, it converts a failure to Nothing
and wraps it into MaybeT
. Thanks to MaybeT
, if opt p
returns Nothing
, opt q
is skipped.
If both p
and q
succeed, the test ..
part returns Just ()
. And if one of them consumes input, the whole test ..
fails. This way, we distinguish the 3 possibilities:
p
or q
.After <|> pure (Just ())
both 1. and 3. result in Just ()
, while 2. results in Nothing. Finally, the maybe
part converts Nothing
into a non-consuming failure, and Just ()
into running the parsers again, now without any protection. This means that 1. fails again with consuming some input, and 3. succeeds.
Testing:
samples =
[ "xyz" -- (accept)
, "xyzP" -- (accept; P remains unparsed)
, "xyzPz" -- (accept; Pz remains unparsed)
, "xyzPx" -- (accept; Px remains unparsed; q failed but did not consume any input)
, "xyzPxy" -- (reject; parser q consumed xy but failed)
, "xyzPxyz" -- (accept)
]
main = do
-- Runs a parser and then accept anything, which shows what's left in the
-- input buffer:
let run p x = runP ((,) <$> p <*> many anyChar) () x x
let p, q :: Parser String
p = string "P"
q = (++) <$> try (string "xy") <*> string "z"
let parser = show <$> ((:) <$> q <*> many (rcomb p q))
mapM_ (print . run parser) samples