Search code examples
parsinghaskellparsec

How can I interpolate values into a string based on a key token using Parsec (Haskell)?


I'm new to the world of parsing, and have a fairly simple-seeming problem:

I have a long string comprised of Chunks of normal text, and Keys that are encoded like <<key-label>>.

data Merge a = Chunk a
             | Key a
  deriving (Show)

key :: Parser (Merge String)
key = Key <$> between (string "<<") (string ">>") (many1 letter)

chunk :: Parser (Merge String)
chunk = Chunk <$> many1 anyChar

prose = many1 $ key <|> chunk

ex = parseTest prose "hi <<x>> ! Do you like <<y>>?"

-- Returns: 
-- [Chunk "hi <<x>> ! Do you like <<y>>?"]

-- I'd like:
-- [Chunk "hi ", Key "x", Chunk " !", ...]

I'd like to replace those keys with values, but I can solve that if I can parse a string into my tokens, IE String -> [Merge].

I've dived into the boundless depths that is lexing/parsing, and while I hope to learn all of it eventually, any guidance on solving this problem now?

This is the simplest instantiation of my attempts, although I have tried separate passes over the data, including separate lexing/parsing steps, and I'd like to use parsec instead of a more concrete interpolation lib.


Solution

  • You can use notFollowedBy to say that you want a chunk to include a character as long as it isn't a key. notFollowedBy doesn't consume input so prose will still go on to parse the key again as its own item.

    chunk = Chunk <$> many1 (notFollowedBy key >> anyChar)
    

    This will allow even things like aaa<<bbbbbb to be parsed as a chunk, by going all the way to the end of the file, not finding a closing >>, deciding that it must not have been a key and therefore it can be part of the chunk.

    If you would rather have << always be the start of a key and fail if it isn't closed, disallow << from the chunk:

    chunk = Chunk <$> many1 (notFollowedBy (string "<<") >> anyChar)