I'm trying to get a parser created from concatenated parsers to backtrack in parsec.
here's the code:
ab = (try $ char 'a') <|> (try $ char 'b')
cd = (try $ char 'b') <|> (try $ char 'c')
abcd = (try $ many1 ab) >> (try cd)
here, when I run
parse abcd "parser"
with the input "aaaaaaaaaaaaaac", it works, but it errors on "aaaaaaaaaaaaab", which should be acceptable.
Any insights as to what I can do to get this to work
Thanks in advance
So you're trying to write the regular expression (a|b)+(b|c)
in Haskell? This is how you would do it:
import Text.Parsec.String
import Text.Parsec
ab :: Parser Char
ab = char 'a' <|> char 'b'
bc :: Parser Char
bc = char 'b' <|> char 'c'
abc :: Parser String
abc = do
a <- ab
b <- bc
return [a,b]
abbc :: Parser String
abbc = try abc <|> do
c <- ab
s <- abbc
return (c:s)
parser :: String -> Either ParseError String
parser = parse abbc "(a|b)+(b|c)"
Now load up ghci
and run the code as follows:
aaditmshah@home:~$ ghci
GHCi, version 7.6.3: http://www.haskell.org/ghc/ :? for help
Loading package ghc-prim ... linking ... done.
Loading package integer-gmp ... linking ... done.
Loading package base ... linking ... done.
Prelude> :load Test.hs
[1 of 1] Compiling Main ( Test.hs, interpreted )
Ok, modules loaded: Main.
*Main> parser "aaaaaaaaaaaaaac"
Loading package array-0.4.0.1 ... linking ... done.
Loading package deepseq-1.3.0.1 ... linking ... done.
Loading package bytestring-0.10.0.2 ... linking ... done.
Loading package transformers-0.3.0.0 ... linking ... done.
Loading package mtl-2.1.2 ... linking ... done.
Loading package text-0.11.3.1 ... linking ... done.
Loading package parsec-3.1.5 ... linking ... done.
Right "aaaaaaaaaaaaaac"
*Main> parser "aaaaaaaaaaaaaab"
Right "aaaaaaaaaaaaaab"
*Main>
Leaving GHCi.
How does this work? Let's understand it parser by parser:
ab :: Parser Char
ab = char 'a' <|> char 'b'
bc :: Parser Char
bc = char 'b' <|> char 'c'
These two parsers are simple. They correspond to the regular expressions (a|b)
and (b|c)
respectively.
abc :: Parser String
abc = do
a <- ab
b <- bc
return [a,b]
This is also relatively simple. It corresponds to the regular expression (a|b)(b|c)
.
abbc :: Parser String
abbc = try abc <|> do
c <- ab
s <- abbc
return (c:s)
This is where things get complicated. This parser does the following:
(a|b)(b|c)
.(a|b)
and then goes back to step 1 to match the rest of the input.Hence it corresponds to the regular expression (a|b)*(a|b)(b|c)
which is the same as (a|b)+(b|c)
.
The try
combinator is important. Without it the parser will try to match the input with (a|b)(b|c)
. Failing to do so it will throw an error instead of backtracking and trying the alternate condition.