I want to parse expressions with variables that start with a $
(as in $a=$b
), using Parsec's Token and Expr modules. Here is a reduced version of my code:
module Main where
import Control.Monad.Identity
import Control.Applicative
import Text.Parsec
import Text.Parsec.String
import qualified Text.Parsec.Token as Tok
import qualified Text.Parsec.Language as Tok
import qualified Text.Parsec.Expr as Ex
data Expr
= BinaryOp String Expr Expr
| Var String
deriving (Show)
lexer :: Tok.TokenParser ()
lexer = Tok.makeTokenParser style
where
style = Tok.emptyDef
{ Tok.reservedOpNames = ["="]
, Tok.reservedNames = []
, Tok.identStart = letter
, Tok.identLetter = alphaNum
}
reservedOp = Tok.reservedOp lexer
identifier = Tok.identifier lexer
whiteSpace = Tok.whiteSpace lexer
parseExpr :: String -> Either ParseError Expr
parseExpr = parse (whiteSpace *> expr <* eof) ""
expr :: Parser Expr
expr = Ex.buildExpressionParser opTable terms <?> "expression"
where
opTable =
[ [ Ex.Infix (reservedOp "=" >> return (BinaryOp "=")) Ex.AssocLeft ] ]
terms =
try var
var :: Parser Expr
var = Var <$> (char '$' >> identifier)
--
main :: IO ()
main = case parseExpr "$a=$b" of
Left err -> print err
Right expr -> print expr
This works fine for expressions with whitespace around the operators (like $a = $b
), but without whitespace ($a=$b
) I get the error:
(line 1, column 5):
unexpected '$'
expecting operator
Also, if I modify the parser to parse variables that don't start with a $
, the parser works with and without whitespace. So there seems to be a problem with the combination of $
and operators that don't have whitespace between them.
The problem is the definition of opStart
and opLetter
in the default token parser:
emptyDef :: LanguageDef st
emptyDef = LanguageDef
{ commentStart = ""
...
, opStart = opLetter emptyDef
, opLetter = oneOf ":!#$%&*+./<=>?@\\^|-~"
...
}
The token parser greedily matches an operator name using opStart
and opLetter
, so $a=$b
parses the same as $a =$ b
. and since =$
is not a operator you get the syntax error.
If you remove $
from opLetter
things should work, e.g.:
lexer :: Tok.TokenParser ()
lexer = Tok.makeTokenParser style
where
style = Tok.emptyDef
{ Tok.reservedOpNames = ["="]
, Tok.reservedNames = []
, Tok.identStart = letter
, Tok.identLetter = alphaNum
, Tok.opLetter = oneOf ":!#%" -- add this line
}