Search code examples
haskelltokenlexereither

Haskell lexer input and returning list of tokens or error


Hi i am working on a problem where lexer takes in input string and return either list of tokens or the error. If anyone knows how to go about solving the problem please help. thank you :)

lexer :: String -> Either Error [Token]

I have gotten problem to be working without using either but can't figure out with either functionality of Haskell. I want left to give an error and right to make list of token with pattern matching with several conditons.

Here's the code that doesn't work:

lexar (x:xs) 
  | isSpace x = Main.lexar xs 
  | isDigit x = Right (Literal (show x): Main.lexar xs) 
  | x == '+' = Right (Plus : Main.lexar xs) 
  | x == '-' = Right (Minus : Main.lexar xs) 
  | x == '*' = Right (Mult : Main.lexar xs) 
  | x == '/' = Right (Div : Main.lexar xs) 
  | x == '(' = Right (LeftP : Main.lexar xs) 
  | x == ')' = Right (RightP : Main.lexar xs) 
  | otherwise = Left (error "fail")

Solution

  • First, you can't use the : operator with the return value of lexar.

    This version of lexar returns an Either, which is not a list, and therefore cannot be the second argument to operator :.

    Every time you call lexar recursively, it returns you an Either, which may be either Left or Right. If it's Left, it contains the error, and you would just pass it up the stack. If it's Right, then it contains the result of downstream lexing, and you would prepend the current token to it, then wrap it in a Right again before returning. Let's write that down:

      | isDigit x = 
          case lexar xs of
              Left err -> Left err
              Right tokens -> Right (Literal (show x) : tokens)
    

    Of course, this would become very tedious if you had to write a whole case expression for every single token. So instead you could wrap that in a helper function:

    lexar (x:xs)
      ...
      | isDigit x = prependToken (Literal (show x)) 
      | x == '+' = prependToken Plus
      ...
      where
          prependToken t = 
              case lexar xs of
                  Left err -> Left err
                  Right tokens -> Right (t : tokens)
    

    If you're comfortable enough with more abstract concepts, you could also utilize the Functor instance of Either by using operator <$>:

    lexar (x:xs)
      ...
      | isDigit x = (Literal (show x) :) <$> lexar xs
      | x == '+' = (Plus :) <$> lexar xs
      ...
    

    The standard library implementation of operator <$> for Either will apply the given function to the Right value or return the Left value unchanged. Essentially it's exactly what my helper function prependToken is doing.


    Finally, when returning a Left, don't call the error function. This function will crash the program when its result is evaluated, which is not what you want. Just wrap the string in Left:

      | otherwise = Left "fail"