I've run into a problem where I want to parse a block of code with the following syntax
{
<stmt>;
<stmt>;
<stmt>;
<expr>
}
A statement can be of the form <expr>;
. This trips up Parsec in a way which I don't know how to fix. This is probably just me being kinda new to Haskell and the Parsec library, but I don't know where to search for a solution to the problem. I've written an example that captures my exact problem.
With the input { 5; 5; 5 }
it fails on the third 5
, because it expects there to be a ;
present. How do I get around this?
import Text.ParserCombinators.Parsec
import Text.ParserCombinators.Parsec.Combinator
parseIdentifier = do
first <- letter
rest <- many $ letter <|> digit <|> char '_'
return $ first : rest
parseExpr = parseIdentifier <|> many1 digit
parseStmt = parseExpr <* char ';'
parseBlock = between
(char '{' >> spaces)
(spaces >> char '}')
(do
stmts <- try $ parseStmt `sepBy` spaces
parseExpr
)
readParser :: Parser String -> String -> String
readParser parser input = case parse parser "dusk" input of
Left err -> show err
Right val -> val
main = interact $ readParser parseBlock
Instead of sepBy
, this sort of problems often can be solved by manyTill
, the tricky point is to keep the input that don't be consumed by manyTill
, it have to use try $ lookAhead
Side note: the reason can be found in source code of
Parsec
. Internally,manyTill
use<|>
, so whytry
take effect, andlookAhead
can retain the input when apply monad bind>>=
,>>
So, the correction look like:
parseBlock = between
(char '{' >> spaces)
(spaces >> char '}')
(do
stmts <- manyTill (parseStmt <* spaces)
(try $ lookAhead (parseExpr >> space))
parseExpr
)
The above parser just return the output of parseExpr
, i.e. 5
, if this is your intent, it can be simpified by:
manyTill (parseStmt <* spaces) (try $ lookAhead (parseExpr >> space)) >> parseExpr
if you actually need the parsed string of statements as well, it become:
(do
stmts <- manyTill (parseStmt <* spaces)
(try $ lookAhead (parseExpr >> space))
expr <- parseExpr
return (concat (stmts ++ [expr]))
)
it return 555