So I'm trying to do the standard "write yourself a parser for a scheme-like language" exercise to figure out MegaParsec and monad transformers. Following the suggestions of many tutorials and blog posts, I'm using ReaderT
and local
to implement lexical scope.
I run into trouble trying to implement let*
. Both let
and let*
share the same syntax, binding variables for use in a subsequent expression. The difference between the two is that let*
lets you use a binding in subsequent ones, whereas let
doesn't:
(let ((x 1) (y 2)) (+ x y)) ; 3
(let* ((x 1) (y (+ x x)) (+ x y)) ; 3
(let ((x 1) (y (+ x x)) (+ x y)) ; Error unbound symbol "x"
My problem is that when parsing a let*
expression, I need to add the bindings to the current scope one-by-one so that each binding is available for use in the subsequent ones. This seems like a good use case for StateT
; allowing me to build up the local scope one binding at a time.
Then, having parsed all the new bindings, I can pass these, together with those inherited from the parent scope, to the third argument of the let*
expression, via local
.
I build my monad transformer stack as follows:
type Parser = Parsec Void String
type Env = Map.Map String Float
type RSParser = ReaderT Env (StateT Env Parser)
And here's the parser, simplified as much as I could while still making my point. In particular, Float
is the only data type and +
, *
, and let*
are the only commands.
data Op = Plus | Times
spaceConsumer :: Parser ()
spaceConsumer = Lexer.space space1
(Lexer.skipLineComment ";")
(Lexer.skipBlockComment "#|" "|#")
lexeme :: Parser a -> RSParser a
lexeme = lift . lift . Lexer.lexeme spaceConsumer
lParen, rParen :: RSParser Char
lParen = lexeme $ char '('
rParen = lexeme $ char ')'
plus, times :: RSParser Op
plus = lexeme $ char '+' $> Plus
times = lexeme $ char '*' $> Times
keyValuePair :: RSParser ()
keyValuePair = between lParen rParen $ do
state <- get
name <- lift . lift $ Lexer.lexeme spaceConsumer (some letterChar)
x <- num
modify (union (fromList [(name, x)]))
keyValuePairs :: RSParser ()
keyValuePairs = between lParen rParen (many keyValuePair) $> ()
num :: RSParser Float
num = lexeme $ Lexer.signed (return ()) Lexer.float
expr, var :: RSParser Float
expr = num <|> var <|> between lParen rParen (arithExpr <|> letStarExpr)
var = do
env <- ask
lift . lift $ do
name <- Lexer.lexeme spaceConsumer (some letterChar)
case Map.lookup name env of
Nothing -> mzero
Just x -> return x
arithExpr = do
op <- (plus <|> times) <?> "operation"
args <- many (expr <?> "argument")
return $ case op of
Plus -> sum args
Times -> product args
letStarExpr = lexeme (string "let*") *> do
keyValuePairs
bindings <- get
local (Map.union bindings) expr
main :: IO ()
main = do
parseTest (runStateT (runReaderT expr (fromList [("x", 1)])) Map.empty)
"(+ (let* ((x 666.0)) x) x)"
-- (667.0,fromList [("x",666.0)]) Ok
parseTest (runStateT (runReaderT expr (fromList [("x", 1)])) Map.empty)
"(+ (let* ((x 666.0)) x) (let* ((w 0.0)) x))"
-- (1332.0,fromList [("x",666.0)]) Wrong
The first test above succeeds, but the second fails. It fails because the mutable state holding x
's binding in the first let*
expression is carried over to the second let*
expression. I need a way to make the this mutable state local to the computation in question and this is what I can't figure out how to do. Is there an analogue of the local
command from Reader
for State
? Am I using the wrong monad transformer stack? Is my approach fundamentally flawed?
The naive (in retrospect) solution that I tried is resetting the mutable state at each let*
expression by adding a put Map.empty
statement to letStarExpr
:
letStarExpr = lexeme (string "let*") *> do
keyValuePairs
bindings <- get
put Map.empty
local (Map.union bindings) expr
But this is incompatible with nested let*
expressions:
parseTest (runStateT (runReaderT expr (fromList [("x", 1)])) Map.empty)
(let* ( (x 666.0) (y (let* ((z 3.0)) z)) ) x)
gives 1.0 instead of 666.0.
Any ideas?
As Alexis King pointed out in comments, it is standard practice to separate parsing from evaluation.
However, to address the current question, it is possible here to evaluate while parsing in an idiomatic way. The key point is the following: lexical scoping without any context-sensitive rules only ever requires a Reader
monad, for scope/type checking and evaluation as well. The reason is in the "lexical" property: purely nested scopes have no side effects on other branches of scope structure, hence there should be nothing to be carried around in a state. So it's best to just get rid of the State
.
The interesting part is letStarExpr
. There, we cannot use many
anymore, because it doesn't allow us to handle the newly bound names on each key-value pair. Instead, we can write a custom version of many
which uses local
to bind a new name on each recursive step. In the code example I just inline this function using fix
.
Another note: lift
should not be commonly used with mtl
; the point of mtl
is to eliminate most lifts. The megaparsec
exports are already generalized over MonadParsec
. Below is a code example with megaparsec
7.0.4, I did the mentioned changes and a few further stylistic ones.
import Control.Monad.Reader
import Data.Map as Map
import Data.Void
import Text.Megaparsec
import qualified Text.Megaparsec.Char as Char
import qualified Text.Megaparsec.Char.Lexer as Lexer
type Env = Map String Double
type Parser = ReaderT Env (Parsec Void String)
spaceConsumer :: Parser ()
spaceConsumer = Lexer.space Char.space1
(Lexer.skipLineComment ";")
(Lexer.skipBlockComment "#|" "|#")
lexeme = Lexer.lexeme spaceConsumer
symbol = Lexer.symbol spaceConsumer
char = lexeme . Char.char
parens :: Parser a -> Parser a
parens = between (char '(') (char ')')
num :: Parser Double
num = lexeme $ Lexer.signed (pure ()) Lexer.float
identifier :: Parser String
identifier = try $ lexeme (some Char.letterChar)
keyValuePair :: Parser (String, Double)
keyValuePair = parens ((,) <$> identifier <*> num)
expr :: Parser Double
expr = num <|> var <|> parens (arithExpr <|> letStarExpr)
var :: Parser Double
var = do
env <- ask
name <- identifier
maybe mzero pure (Map.lookup name env)
arithExpr :: Parser Double
arithExpr =
(((sum <$ char '+') <|> (product <$ char '*')) <?> "operation")
<*> many (expr <?> "argument")
letStarExpr :: Parser Double
letStarExpr = do
symbol "let*"
char '('
fix $ \go ->
(char ')' *> expr)
<|> do {(x, n) <- keyValuePair; local (insert x n) go}
main :: IO ()
main = do
parseTest (runReaderT expr (fromList [("x", 1)]))
"(+ (let* ((x 666.0)) x) x)"
parseTest (runReaderT expr (fromList [("x", 1)]))
"(+ (let* ((x 666.0)) x) (let* ((w 0.0)) x))"