I'm new to the world of parsing, and have a fairly simple-seeming problem:
I have a long string comprised of Chunk
s of normal text, and Key
s that are encoded like <<key-label>>
.
data Merge a = Chunk a
| Key a
deriving (Show)
key :: Parser (Merge String)
key = Key <$> between (string "<<") (string ">>") (many1 letter)
chunk :: Parser (Merge String)
chunk = Chunk <$> many1 anyChar
prose = many1 $ key <|> chunk
ex = parseTest prose "hi <<x>> ! Do you like <<y>>?"
-- Returns:
-- [Chunk "hi <<x>> ! Do you like <<y>>?"]
-- I'd like:
-- [Chunk "hi ", Key "x", Chunk " !", ...]
I'd like to replace those keys with values, but I can solve that if I can parse a string into my tokens, IE String -> [Merge]
.
I've dived into the boundless depths that is lexing/parsing, and while I hope to learn all of it eventually, any guidance on solving this problem now?
This is the simplest instantiation of my attempts, although I have tried separate passes over the data, including separate lexing/parsing steps, and I'd like to use parsec
instead of a more concrete interpolation lib.
You can use notFollowedBy
to say that you want a chunk to include a
character as long as it isn't a key. notFollowedBy
doesn't consume
input so prose
will still go on to parse the key again as its own item.
chunk = Chunk <$> many1 (notFollowedBy key >> anyChar)
This will allow even things like aaa<<bbbbbb
to be parsed as a chunk,
by going all the way to the end of the file, not finding a closing
>>
, deciding that it must not have been a key and therefore it can
be part of the chunk.
If you would rather have <<
always be the start of a key and fail if
it isn't closed, disallow <<
from the chunk:
chunk = Chunk <$> many1 (notFollowedBy (string "<<") >> anyChar)