Search code examples
parsinghaskellfunctional-programmingtokenmonads

How do I generalize a Parser which extracts values from tokens?


I'm creating a parser in Haskell for a simple language. The parser takes a list of tokens, generated by a separate tokenizer, and turns it into a structured tree. While I was writing this Parser, I came across a reoccurring pattern that I would like to generalize. Here are the relevant definitions:

--Parser Definition
data Parser a b = Parser { runParser :: [a] -> Maybe ([a], b) }

--Token Definition
data Token = TokOp    { getTokOp :: Operator }
           | TokInt   { getTokInt :: Int }
           | TokIdent { getTokIdent :: String }
           | TokAssign
           | TokLParen
           | TokRParen
   deriving (Eq, Show)

The parser also has instances defined for MonadPlus and all of its super classes. Here are two examples of the reoccurring Pattern I am trying to generalize:

-- Identifier Parser
identP :: Parser Token String
identP = Parser $ \input -> case input of
  TokIdent s : rest -> Just (rest, s)
  _                 -> Nothing

--Integer Parser
intP :: Parser Token Int
intP = Parser $ \input -> case input of
  TokInt n : rest -> Just (rest, n)
  _               -> Nothing

As you can see, these two examples are incredibly similar, yet I see no way generalize it. Ideally I would want some function of type extractToken :: ?? -> Parser Token a where a is the value contained by the token. I am guessing the solution involves >>=, but I am not experienced enough with Monads to figure it out. This also may be impossible, I am not sure.


Solution

  • It seems difficult to avoid at least some boilerplate. One simple approach is to manually define the field selectors to return Maybes:

    {-# LANGUAGE LambdaCase #-}
    
    data Token = TokOp Operator
               | TokInt Int
               | TokIdent String
               | TokAssign
               | TokLParen
               | TokRParen
       deriving (Eq, Show)
    
    getTokOp    = \case { TokOp x    -> Just x ; _ -> Nothing }
    getTokInt   = \case { TokInt x   -> Just x ; _ -> Nothing }
    getTokIdent = \case { TokIdent x -> Just x ; _ -> Nothing }
    

    and then the rest is:

    fieldP :: (Token -> Maybe a) -> Parser Token a
    fieldP sel = Parser $ \case tok:rest -> (,) rest <$> sel tok
                                []       -> Nothing
    
    opP    = fieldP getTokOp
    identP = fieldP getTokIdent
    intP   = fieldP getTokInt
    

    You could derive the getXXX selectors using Template Haskell or generics, though it's likely not worth it.