Search code examples
haskellparsec

Eagerness of separator parser in `sepBy` in Parsec


This is a question pertaining to a more general question of parser precedence in Haskell's Parsec, so feel free to respond with more generality about my problem.

Let's say I have a p :: Parsec String () Int which consumes every character in a String and sums its ASCII values at the end. I want to turn this p into another parser of the same type which computes the result of this ASCII summing on substrings which are separated by the newline character, and takes their product. E.g. AB\nZ\nCEF parses to (65+66)*(90)*(67+69+70)=2428740.

I could use lines on the input, map the elements to their parse values, and take their product, but I want to keep this Parsec-native; at the very least I would want a general function Parsec String u a -> Parsec [String] u [a].

I could use sepBy p newline, but this will only work if my p was specifically designed to reject parsing \n, which isn't very compositionally minded.


Solution

  • Parsec doesn't explicitly manage precedence of its parsers. Rather, it combines parsers using various control structure combinators, like sequences or loops, so that a sequence p >> q means "first run parser p, and once it's successfully completed, run parser q; if either fails, fail the whole operation". Control over the input stream comes from the bottom up. A parser decides when it's done, whether it succeeds or fails, and how much of the input stream it consumes, and the parent combinator's control over a parser is basically limited to an all-or-nothing decision: either accept what the parser did or reject it entirely.

    This means that a particular parser, like your p, that gobbles the rest of the stream, can't be controlled by a parent combinator, like sepBy. All sepBy can do is run it to completion, and then check if the separator parser succeeds before running p again.

    The correct way to write a Parsec parser to do what you want is to modify p so that it rejects newlines. This may not strike you as "compositionally minded", but it's how Parsec works, and when you write more realistic Parsec parsers, you'll discover that the compositional aspects for Parsec parsers rely on individual parsers playing nicely with the "shared resource" of the input stream. A Parsec parser must make the correct local decision about how much of the input stream to parse. You'll also find that higher level combinators will frequently need to be written with non-local knowledge about their composite parsers, to avoid falling into "try hell". I'm thinking here specifically about ordering of parsers in a chain of <|> alternatives.

    This is just a fundamental limitation of the Parsec design, and it's seen as a necessary trade off between "compositionality" and "practicality/efficiency".

    That being said, don't forget that you're programming in a functional programming language, parsers are just values, and functions can create them. If you want a reusable version of p, try:

    type Parser = Parsec String ()
    
    sumWhile :: (Char -> Bool) -> Parser Int
    sumWhile p = sum . map ord <$> many (satisfy p)
    
    p = sumWhile (/= '\n')
    

    This example isn't so great because, honestly, this component isn't worth reusing, but this technique is helpful for more complex reusable components.