I am attempting to develop a lambda calculus interpreter, which supports definitions of terms. For that, I'm using the Megaparsec library in Haskell.
I have the following grammar:
term := var | lambda var . term | term term
def := var = term
instruction := :e t | def
My parser should return a list of instruction
s, where :e t
will evaluate to whatever t
is reduced, and def
will allow me to have named definitions.
So far, I've managed to parse a very similar grammar, except I'm using semicolons to separate each instruction:
...
instruction := :e t; | def;
I'd like to remove the need for those semicolons, but I'm having no luck with that. My code is currently:
sc :: Parser ()
sc = L.space space1 (L.skipLineComment "--") (L.skipBlockComment "{-" "-}")
lexeme :: Parser a -> Parser a
lexeme = L.lexeme sc
symbol :: Text -> Parser Text
symbol = L.symbol sc
semicolon :: Parser Text
semicolon = symbol ";"
lowercaseWord :: Parser [Char]
lowercaseWord = some lowerChar
variable :: Parser Term
variable = lexeme lowercaseWord <&> Var
lamAbs :: Parser Term
lamAbs = do
symbol "lambda"
space
v <- lexeme lowercaseWord
space
symbol "."
space
t <- term
space
return $ Abs v t
term :: Parser Term
term = foldl1 App <$> sepBy1 (parens term <|> try lamAbs <|> variable) sc
where parens = between (symbol "(") (symbol ")")
definition :: Parser (Instruction Term)
definition = do
n <- lexeme lowercaseWord
space
symbol "="
space
t <- term
return $ IDef n t
commandEval :: Parser (Instruction Term)
commandEval = do
symbol ":e"
t <- term
optional newline
return $ IEval t
program :: Parser [Instruction Term]
program = sc *> sepEndBy (definition <|> commandEval) semicolon <* eof
parse :: ProgramParser (Instruction Term)
parse text = case runParser program "" text of
Left err -> Left $ errorBundlePretty err
Right ast -> Right ast
When I attempt to remove the need for semicolons by changing the definition of program
to:
program = sc *> some (definition <|> commandEval) <* eof
the parser will not be able to figure out that it needs to stop parsing a term
once it finds something like var = ...
, to continue parsing the next definition.
For example, the following input:
a = (lambda x . x)
x
:e b a
b = lambda y . y
returns the following error:
6:3:
|
6 | b = lambda y . y
| ^
unexpected '='
expecting ":e", "lambda", '(', end of input, lowercase letter, or newline
I tried to change the definition of term
to make it explicit that it should not be followed by a lowercase word and a =
. But that didn't work either:
term = foldl1 App <$> sepBy1 ((parens term <|> try lamAbs <|> variable) <* notFollowedBy newdef) sc
where parens = between (symbol "(") (symbol ")")
newdef = (lexeme lowercaseWord *> space) *> symbol "="
How can I make it so that I don't have the need for semicolons, and my parser automatically stops parsing a term application once it finds var = term
?
Thank you in advance for your help.
As noted by @Poselsky, the point of using Text.Megaparsec.Char.Lexer
is that you don't need (and shouldn't have) to deal with white space in your higher-level parsers. So, I'd start by deleting every use of space
in lamAbs
and definition
, deleting the optional newline
from commandEval
and replacing sepBy1 xxx sc
with some xxx
in term
. (In all these cases, these parsers perform no function. They are run on the input stream after the previous lexeme parser has already absorbed all trailing whitespace.)
I'd also move the lexeme
call into lowercaseWord
:
lowercaseWord :: Parser [Char]
lowercaseWord = lexeme (some lowerChar)
Then, except for your primitive parsers (now, symbol
and lowercaseWord
), the only place you should need to deal with whitespace is the sc
at the start of your top-level parser program
. Every other parser should be assumed to start at non-whitespace and absorb trailing whitespace, and those parsers can be combined in sequence to create another parser that follows the same rule, so no stray space
or sc
or newline
calls are needed.
That will simplify your code, though it won't fix your problem.
Your attempted fix is the right idea. The problem is that it can't parse the last variable before a newdef
because that variable
is followed by a newdef
.
You don't actually need to try to parse a whole definition. All you really need to do is make sure a variable
isn't followed by a symbol "="
, and you need to wrap it in a try
block to backtrack on the name parsed by variable
:
term :: Parser Term
term = foldl1 App <$> some (parens term <|> try lamAbs <|>
try (variable <* notFollowedBy (symbol "=")))
where parens = between (symbol "(") (symbol ")")
Do be warned that there are still some problems with your parser. It accepts the following input, for example, which you might not expect:
x = lambdalicious. man
(Note that this has nothing to do with removing the space
calls -- your original parser parsed this, too.)
The problem is that symbol "lambda"
can match the prefix of a larger word, so "lambdalicious"
is parsed as "lambda licious"
. There's a similar issue with symbol ":e"
, if you don't want ":eval"
to parse as ":e val"
.
The usual solution is to define:
reserved :: Text -> Parser Text
reserved s = lexeme $ try (string s <* notFollowedBy lowerChar)
and then the following work as intended:
reserved "lambda"
reserved ":e"
Here's my full test program with all the above revisions:
{-# LANGUAGE OverloadedStrings #-}
module LambdaParser where
import Data.Void
import Data.Functor
import Data.Text (Text)
import Text.Megaparsec hiding (parse)
import Text.Megaparsec.Char
import qualified Text.Megaparsec.Char.Lexer as L
type Parser = Parsec Void Text
data Term = Var String | Instruction Term | Abs String Term | App Term Term deriving (Show)
data Instruction t = IDef String t | IEval Term deriving (Show)
sc :: Parser ()
sc = L.space space1 (L.skipLineComment "--") (L.skipBlockComment "{-" "-}")
lexeme :: Parser a -> Parser a
lexeme = L.lexeme sc
symbol :: Text -> Parser Text
symbol = L.symbol sc
reserved :: Text -> Parser Text
reserved s = lexeme $ try (string s <* notFollowedBy lowerChar)
semicolon :: Parser Text
semicolon = symbol ";"
lowercaseWord :: Parser [Char]
lowercaseWord = lexeme (some lowerChar)
variable :: Parser Term
variable = lowercaseWord <&> Var
lamAbs :: Parser Term
lamAbs = do
reserved "lambda"
v <- lowercaseWord
symbol "."
t <- term
return $ Abs v t
term :: Parser Term
term = foldl1 App <$> some (parens term <|> try lamAbs <|>
try (variable <* notFollowedBy (symbol "=")))
where parens = between (symbol "(") (symbol ")")
definition :: Parser (Instruction Term)
definition = do
n <- lowercaseWord
symbol "="
t <- term
return $ IDef n t
commandEval :: Parser (Instruction Term)
commandEval = do
reserved ":e"
t <- term
return $ IEval t
program :: Parser [Instruction Term]
program = sc *> some (definition <|> commandEval) <* eof
parse :: Text -> Either String [Instruction Term]
parse text = case runParser program "" text of
Left err -> Left $ errorBundlePretty err
Right ast -> Right ast
main = do
let ex1 = "a = (lambda x . x)\n x\n\n:e b a\n\nb = lambda y . y"
case parse ex1 of
Left err -> putStrLn err
Right pgm -> print pgm