Search code examples
haskellparsec

How can I use a Parsec parser which has a different stream type than another Parsec parser?


I have a parser written with Text as the stream type, while by default the Text.Parsec.String module uses String otherwise.

How can I use the custom written parser (Parsec Text b c) in the context of Parsec String b c?

Essentially it seems I would need such a function:

f :: Parsec Text b c -> Parsec String b c
f = undefined

Although it sounds possible, it seems like it might be quite complex to do.


Solution

  • It's gruesome, but relatively straightforward. The idea is to use the low-level functions runParsecT and mkPT to deconstruct and reconstruct the parser, bracketing it with adapters to modify the stream type of the incoming and outgoing state:

    import Text.Parsec
    import Data.Text (Text)
    import qualified Data.Text as Text
    
    stringParser :: (Monad m) => ParsecT Text u m a -> ParsecT String u m a
    stringParser p = mkPT $ \st -> (fmap . fmap . fmap) outReply $ runParsecT p (inState st)
      where inState :: State String u -> State Text u
            inState  (State i pos u) = State (Text.pack i) pos u
            outReply :: Reply Text u a -> Reply String u a
            outReply (Ok a (State i pos u) e) = Ok a (State (Text.unpack i) pos u) e
            outReply (Error e) = Error e
    

    It seems to work okay:

    myTextParser :: Parsec Text () String
    myTextParser = (:) <$> oneOf "abc" <*> many letter
    
    myStringParser :: Parsec String () (String, String)
    myStringParser = (,) <$> p <* spaces <*> p
      where p = stringParser myTextParser
    
    main = do
      print =<< parseTest myStringParser "avocado butter"
      print =<< parseTest myStringParser "apple error"
    

    giving:

    λ> main
    ("avocado","butter")
    ()
    parse error at (line 1, column 7):
    unexpected "e"
    expecting space
    ()
    

    HOWEVER, there are likely to be some serious performance problems here, unless this is being used in a small, toy parser. The pack calls will take the entire incoming stream and convert it to a Text value. If you are parsing from a lazy String (e.g., from a lazy I/O call), the first use of a converted parser will read the entire string into memory as a Text and pump it back out as a String; further calls to the same parser will re-pack the remaining stream as Text each time. Switching to lazy Text won't really help, since pack still packs the whole input into the "lazy" Text value.

    You'll need to run some tests/benchmarks to see if this performance hit is acceptable in your application. Generally speaking, rewriting the Text parser (or seeing if it will compile with an abstract stream type) will be a better approach.

    Full code example:

    {-# OPTIONS_GHC -Wall #-}
    
    import Text.Parsec
    import Data.Text (Text)
    import qualified Data.Text as Text
    
    stringParser :: (Monad m) => ParsecT Text u m a -> ParsecT String u m a
    stringParser p = mkPT $ \st -> (fmap . fmap . fmap) outReply $ runParsecT p (inState st)
      where inState :: State String u -> State Text u
            inState  (State i pos u) = State (Text.pack i) pos u
            outReply :: Reply Text u a -> Reply String u a
            outReply (Ok a (State i pos u) e) = Ok a (State (Text.unpack i) pos u) e
            outReply (Error e) = Error e
    
    myTextParser :: Parsec Text () String
    myTextParser = (:) <$> oneOf "abc" <*> many letter
    
    myStringParser :: Parsec String () (String, String)
    myStringParser = (,) <$> p <* spaces <*> p
      where p = stringParser myTextParser
    
    main = do
      print =<< parseTest myStringParser "avocado butter"
      print =<< parseTest myStringParser "apple error"