haskell monads text-parsing lexer megaparsec

Mixing Parser Char (lexer?) vs. Parser String

I've written several compilers and am familiar with lexers, regexs/NFAs/DFAs, parsers and semantic rules in flex/bison, JavaCC, JavaCup, antlr4 and so on.

Is there some sort of magical monadic operator that seamlessly grows/combines a token with a mix of Parser Char (ie Text.Megaparsec.Char) vs. Parser String?

Is there a way / best practices to represent a clean separation of lexing tokens and nonterminal expectations?

Solution

Typically, one uses applicative operations to directly combine Parser Char and Parser Strings, rather than "upgrading" the former. For example, a parser for alphanumeric identifiers that must start with a letter would probably look like:

ident :: Parser String
ident = (:) <$> letterChar <*> alphaNumChar

If you were doing something more complicated, like parsing dollar amounts with optional cents, for example, you might write:

dollars :: Parser String
dollars = (:) <$> char '$' <*> some digitChar
          <**> pure (++)
          <*> option "" ((:) <$> char '.' <*> replicateM 2 digitChar)

If you find yourself trying to build a Parser String out of a complicated sequence of Parser Char and Parser String parsers in a lot of situations, then you could define a few helper operators. If you find the variety of operators annoying, you could just define (<++>) and a short-form for charToStr like c :: Parser Char -> Parser String.

(<.+>) :: Parser Char -> Parser String -> Parser String
p <.+> q = (:) <$> p <*> q
infixr 5 <.+>

(<++>) :: Parser String -> Parser String -> Parser String
p <++> q = (++) <$> p <*> q
infixr 5 <++>

(<..>) :: Parser Char -> Parser Char -> Parser String
p <..> q = p <.+> fmap (:[]) q
infixr 5 <..>

so you can write something like:

dollars' :: Parser String
dollars' = char '$' <.+> some digitChar 
           <++> option "" (char '.' <.+> digitChar <..> digitChar)

As @leftroundabout says, there's nothing hackish about fmap (:[]). If you prefer, write fmap (\c -> [c]) if you think it looks clearer.