Search code examples
haskellparsec

Reading list of statements and ending with a single expression, when statements can be expressions


I've run into a problem where I want to parse a block of code with the following syntax

{
    <stmt>;
    <stmt>;
    <stmt>;
    <expr>
}

A statement can be of the form <expr>;. This trips up Parsec in a way which I don't know how to fix. This is probably just me being kinda new to Haskell and the Parsec library, but I don't know where to search for a solution to the problem. I've written an example that captures my exact problem.

With the input { 5; 5; 5 } it fails on the third 5, because it expects there to be a ; present. How do I get around this?

import           Text.ParserCombinators.Parsec
import           Text.ParserCombinators.Parsec.Combinator

parseIdentifier = do
    first <- letter
    rest  <- many $ letter <|> digit <|> char '_'
    return $ first : rest

parseExpr = parseIdentifier <|> many1 digit


parseStmt = parseExpr <* char ';'

parseBlock = between
    (char '{' >> spaces)
    (spaces >> char '}')
    (do
        stmts <- try $ parseStmt `sepBy` spaces
        parseExpr
    )

readParser :: Parser String -> String -> String
readParser parser input = case parse parser "dusk" input of
    Left  err -> show err
    Right val -> val

main = interact $ readParser parseBlock

Solution

  • Instead of sepBy, this sort of problems often can be solved by manyTill, the tricky point is to keep the input that don't be consumed by manyTill, it have to use try $ lookAhead

    Side note: the reason can be found in source code of Parsec. Internally, manyTill use <|>, so why try take effect, and lookAhead can retain the input when apply monad bind >>=, >>

    So, the correction look like:

    parseBlock = between
        (char '{' >> spaces)
        (spaces >> char '}')
        (do
            stmts <- manyTill (parseStmt <* spaces) 
                              (try $ lookAhead (parseExpr >> space))
            parseExpr
        )
    

    The above parser just return the output of parseExpr, i.e. 5, if this is your intent, it can be simpified by:

    manyTill (parseStmt <* spaces) (try $ lookAhead (parseExpr >> space)) >> parseExpr
    

    if you actually need the parsed string of statements as well, it become:

    (do
        stmts <- manyTill (parseStmt <* spaces) 
                          (try $ lookAhead (parseExpr >> space))
        expr  <- parseExpr
        return (concat (stmts ++ [expr]))
    )
    

    it return 555