I'm trying to write a parser for a simple language; basically right now it has literals, ifs, function application and not much else.
Here's the code I've got:
import Text.ParserCombinators.Parsec
import Control.Monad (liftM)
data Expr = Term Term
| Apply Expr Expr
| If Expr Expr Expr
deriving (Show)
data Term = Bool Bool
| Num Double
| String String
| Identifier String
| Parens Expr
deriving (Show)
sstring s = spaces >> string s
schar c = spaces >> char c
keyword k = do
kw <- try (sstring k)
notFollowedBy alphaNum
return kw
pBool :: Parser Bool
pBool = do
bool <- keyword "True" <|> keyword "False"
case bool of
"True" -> return True
"False" -> return False
pDouble :: Parser Double
pDouble = do
ds <- many1 digit
dot <- optionMaybe $ char '.'
case dot of
Nothing -> return $ read ds
_ -> do
ds' <- many1 digit
return $ read (ds ++ "." ++ ds')
pString :: Parser String
pString = do
char '"'
str <- many1 $ noneOf "\""
char '"'
return str
pIdentifier :: Parser String
pIdentifier = spaces >> many1 letter
pParens :: Parser Expr
pParens = do
schar '('
expr <- pExpr
schar ')'
return expr
pTerm :: Parser Term
pTerm = try (liftM Bool pBool)
<|> try (liftM Num pDouble)
<|> try (liftM String pString)
<|> try (liftM Identifier pIdentifier)
<|> try (liftM Parens pParens)
-- TODO: make this left-associative
pApply :: Parser Expr
pApply = do
term <- pTerm'
mApp <- spaces >> optionMaybe pApply
return $ case mApp of
Just app -> Apply term app
Nothing -> term
-- pulls "parens" expressions out of terms
pTerm' :: Parser Expr
pTerm' = do
term <- pTerm
case term of
Parens expr -> return expr
otherwise -> return $ Term term
pIf :: Parser Expr
pIf = do
keyword "if"
cond <- pExpr
keyword "then"
ifTrue <- pExpr
keyword "else"
ifFalse <- pExpr
return $ If cond ifTrue ifFalse
pExpr :: Parser Expr
pExpr = try pIf <|> pApply
test parser = parse parser ""
Now, if I try to parse a single number expression in ghci, all is well:
> test pExpr "1"
Right (Term (Num 1.0))
Great! And many other things work too:
> test pExpr "1.234"
Right (Term (Num 1.234))
> test pApply "neg 1"
Right (Apply (Term (Identifier "neg")) (Term (Num 1.0)))
> test pExpr "f g 1"
Right (Apply (Term (Identifier "f")) (Apply (Term (Identifier "g")) (Term (Num 1.0))))
But now, if I try to parse an if
statement, I get an error:
> test pIf "if 1 then 2 else 3"
Left (line 1, column 4):
unexpected "1"
expecting space, "if", "True", "False", letter or "("
This doesn't make sense to me! Let's step through this, looking at the rule for parsing an if statement:
We parse an "if"
keyword (no problem). Then for the next parse (the 1
), we need to parse pExpr
, which itself can be an pIf
or a pApply
. Well it's not an if, so we try the apply, which itself tries pTerm'
, which tries pTerm
, which tries a pBool
, which fails, and then a pNum
, which succeeds! Then pTerm
succeeds with a Num 1.0
, so pTerm'
succeeds with a Term (Num 1.0)
, which means pExpr
succeeds with a Term (Num 1.0)
, and that gets passed into the cond
variable... right? Well, clearly not, because it's failing! I don't see why it should fail here.
You have problems with not eating all the spaces, and the then
and else
are being interpreted as identifiers. A lexeme
rule is handy for eating spaces after any token. Your pIdentifier
needs to explicitly check that it hasn't gobbled up a reserved word. I fixed these problems, and took the liberty of using some of the existing combinators, and changed to applicative style...
import Text.ParserCombinators.Parsec
import Control.Applicative hiding ((<|>))
data Expr = Term Term
| Apply Expr Expr
| If Expr Expr Expr
deriving (Show)
data Term = Bool Bool
| Num Double
| String String
| Identifier String
| Parens Expr
deriving (Show)
keywords = ["if", "then", "else", "True", "False"]
lexeme p = p <* spaces
schar = lexeme . char
keyword k = lexeme . try $
string k <* notFollowedBy alphaNum
pBool :: Parser Bool
pBool = (True <$ keyword "True") <|> (False <$ keyword "False")
pDouble :: Parser Double
pDouble = lexeme $ do
ds <- many1 digit
option (read ds) $ do
char '.'
ds' <- many1 digit
return $ read (ds ++ "." ++ ds')
pString :: Parser String
pString = lexeme . between (char '"') (char '"') . many1 $ noneOf "\""
pIdentifier :: Parser String
pIdentifier = lexeme . try $ do
ident <- many1 letter
if ident `elem` keywords
then unexpected $ "reserved word " ++ show ident
else return ident
pParens :: Parser Expr
pParens = between (schar '(') (schar ')') pExpr
pTerm :: Parser Term
pTerm = choice [ Bool <$> pBool
, Num <$> pDouble
, String <$> pString
, Identifier <$> pIdentifier
, Parens <$> pParens
]
-- TODO: make this left-associative
pApply :: Parser Expr
pApply = do
term <- pTerm'
option term $
Apply term <$> pApply
-- pulls "parens" expressions out of terms
pTerm' :: Parser Expr
pTerm' = do
term <- pTerm
case term of
Parens expr -> return expr
_ -> return $ Term term
pIf :: Parser Expr
pIf = If <$ keyword "if" <*> pExpr
<* keyword "then" <*> pExpr
<* keyword "else" <*> pExpr
pExpr :: Parser Expr
pExpr = pIf <|> pApply
test parser = parse (spaces *> parser <* eof) ""