Search code examples
haskellparsec

Parsec: parsing expressions with variables that start with '$' (no whitespace)


I want to parse expressions with variables that start with a $ (as in $a=$b), using Parsec's Token and Expr modules. Here is a reduced version of my code:

module Main where

import Control.Monad.Identity
import Control.Applicative

import Text.Parsec
import Text.Parsec.String

import qualified Text.Parsec.Token    as Tok
import qualified Text.Parsec.Language as Tok
import qualified Text.Parsec.Expr     as Ex

data Expr
  = BinaryOp String Expr Expr
  | Var String
  deriving (Show)

lexer :: Tok.TokenParser ()
lexer = Tok.makeTokenParser style
  where
    style = Tok.emptyDef
      { Tok.reservedOpNames = ["="]
      , Tok.reservedNames   = []
      , Tok.identStart      = letter
      , Tok.identLetter     = alphaNum
      }

reservedOp = Tok.reservedOp lexer
identifier = Tok.identifier lexer
whiteSpace = Tok.whiteSpace lexer

parseExpr :: String -> Either ParseError Expr
parseExpr = parse (whiteSpace *> expr <* eof) ""

expr :: Parser Expr
expr = Ex.buildExpressionParser opTable terms <?> "expression"
  where
    opTable =
      [ [ Ex.Infix (reservedOp "=" >> return (BinaryOp "=")) Ex.AssocLeft ] ]
    terms =
      try var

var :: Parser Expr
var = Var <$> (char '$' >> identifier)

--

main :: IO ()
main = case parseExpr "$a=$b" of
  Left err -> print err
  Right expr -> print expr

This works fine for expressions with whitespace around the operators (like $a = $b), but without whitespace ($a=$b) I get the error:

(line 1, column 5):
unexpected '$'
expecting operator

Also, if I modify the parser to parse variables that don't start with a $, the parser works with and without whitespace. So there seems to be a problem with the combination of $ and operators that don't have whitespace between them.


Solution

  • The problem is the definition of opStart and opLetter in the default token parser:

    emptyDef   :: LanguageDef st
    emptyDef    = LanguageDef
                   { commentStart   = ""
                   ...
                   , opStart        = opLetter emptyDef
                   , opLetter       = oneOf ":!#$%&*+./<=>?@\\^|-~"
                   ...
                   }
    

    The token parser greedily matches an operator name using opStart and opLetter, so $a=$b parses the same as $a =$ b. and since =$ is not a operator you get the syntax error.

    If you remove $ from opLetter things should work, e.g.:

    lexer :: Tok.TokenParser ()
    lexer = Tok.makeTokenParser style
      where
        style = Tok.emptyDef
          { Tok.reservedOpNames = ["="]
          , Tok.reservedNames   = []
          , Tok.identStart      = letter
          , Tok.identLetter     = alphaNum
          , Tok.opLetter        = oneOf ":!#%"  -- add this line
          }