Search code examples
haskellparsec

how to write a backtracking concatenated parser in haskell


I'm trying to get a parser created from concatenated parsers to backtrack in parsec.

here's the code:

ab = (try $ char 'a') <|> (try $ char 'b')
cd = (try $ char 'b') <|> (try $ char 'c')
abcd = (try $ many1 ab) >> (try cd)

here, when I run

parse abcd "parser"

with the input "aaaaaaaaaaaaaac", it works, but it errors on "aaaaaaaaaaaaab", which should be acceptable.

Any insights as to what I can do to get this to work

Thanks in advance


Solution

  • So you're trying to write the regular expression (a|b)+(b|c) in Haskell? This is how you would do it:

    import Text.Parsec.String
    import Text.Parsec
    
    ab :: Parser Char
    ab = char 'a' <|> char 'b'
    
    bc :: Parser Char
    bc = char 'b' <|> char 'c'
    
    abc :: Parser String
    abc = do
        a <- ab
        b <- bc
        return [a,b]
    
    abbc :: Parser String
    abbc = try abc <|> do
        c <- ab
        s <- abbc
        return (c:s)
    
    parser :: String -> Either ParseError String
    parser = parse abbc "(a|b)+(b|c)"
    

    Now load up ghci and run the code as follows:

    aaditmshah@home:~$ ghci
    GHCi, version 7.6.3: http://www.haskell.org/ghc/  :? for help
    Loading package ghc-prim ... linking ... done.
    Loading package integer-gmp ... linking ... done.
    Loading package base ... linking ... done.
    Prelude> :load Test.hs
    [1 of 1] Compiling Main             ( Test.hs, interpreted )
    Ok, modules loaded: Main.
    *Main> parser "aaaaaaaaaaaaaac"
    Loading package array-0.4.0.1 ... linking ... done.
    Loading package deepseq-1.3.0.1 ... linking ... done.
    Loading package bytestring-0.10.0.2 ... linking ... done.
    Loading package transformers-0.3.0.0 ... linking ... done.
    Loading package mtl-2.1.2 ... linking ... done.
    Loading package text-0.11.3.1 ... linking ... done.
    Loading package parsec-3.1.5 ... linking ... done.
    Right "aaaaaaaaaaaaaac"
    *Main> parser "aaaaaaaaaaaaaab"
    Right "aaaaaaaaaaaaaab"
    *Main> 
    Leaving GHCi.
    

    How does this work? Let's understand it parser by parser:

    ab :: Parser Char
    ab = char 'a' <|> char 'b'
    
    bc :: Parser Char
    bc = char 'b' <|> char 'c'
    

    These two parsers are simple. They correspond to the regular expressions (a|b) and (b|c) respectively.

    abc :: Parser String
    abc = do
        a <- ab
        b <- bc
        return [a,b]
    

    This is also relatively simple. It corresponds to the regular expression (a|b)(b|c).

    abbc :: Parser String
    abbc = try abc <|> do
        c <- ab
        s <- abbc
        return (c:s)
    

    This is where things get complicated. This parser does the following:

    1. It tries to match the input with the regular expression (a|b)(b|c).
    2. If it fails then it doesn't consume any input. Instead it matches the first character of input with (a|b) and then goes back to step 1 to match the rest of the input.

    Hence it corresponds to the regular expression (a|b)*(a|b)(b|c) which is the same as (a|b)+(b|c).

    The try combinator is important. Without it the parser will try to match the input with (a|b)(b|c). Failing to do so it will throw an error instead of backtracking and trying the alternate condition.