Search code examples
parsinghaskellmonadsparser-combinatorstrifecta

Custom state in Trifecta


I'm using Trifecta parser combinator library and my parser outputs instances of AST data type. I want each instance to have unique ID (which is simple Int).

In Parsec I would create custom state and increment the ID when neccesery. How can we do it in Trifecta?


Solution

  • You can enhance the Parser monad with the StateT monad transformer to get what you want. This integrates well with the rest of the library, as most of the combinators use type classes rather than concrete types (meaning you don't have to do much lifting for the code to work). Here is a decent example of this. It parses a grammar with identifiers and symbols separated by whitespace. Each identifier is give a unique number.

    module Main where
    import Text.Trifecta
    import Control.Monad.State
    import Control.Applicative
    import Data.Monoid
    
    data Identifier = Identifier String Int deriving (Show)
    
    identifier :: StateT Int Parser Identifier
    identifier = do
      name <- some letter
      newId <- get
      modify (+1)
      return $ Identifier name newId
    
    symbolToken :: Parser Char
    symbolToken = oneOf "+-*/"
    
    data Token = IdentifierToken Identifier | SymbolToken Char deriving (Show)
    
    singleToken :: StateT Int Parser Token
    singleToken = try (IdentifierToken <$> identifier) <|> (SymbolToken <$> lift symbolToken)
    
    parseTokens :: StateT Int Parser [Token]
    parseTokens = singleToken `sepBy1` spaces
    
    testParse :: String -> Result [Token]
    testParse = parseString (evalStateT parseTokens 0) mempty 
    
    test1 :: Result [Token]
    test1 = testParse "these are identifiers and + some / symbols -"
    

    test1 results in:

    Success [IdentifierToken (Identifier "these" 0)
    ,IdentifierToken (Identifier "are" 1)
    ,IdentifierToken (Identifier "identifiers" 2)
    ,IdentifierToken (Identifier "and" 3)
    ,SymbolToken '+',IdentifierToken (Identifier "some" 4)
    ,SymbolToken '/',IdentifierToken (Identifier "symbols" 5),SymbolToken '-']