Haskell Parsec - parsing two lists of stuff

I'm using Advent of Code part 16 as an excuse to learn how to use Parsec, but I'm stumbling on how to handle this specific case.

The input is on the following format:

Before: [3, 2, 3, 0]
2 3 1 1
After:  [3, 2, 3, 0]

Before: [1, 0, 2, 1]
7 0 1 1
After:  [1, 1, 2, 1]

...

Before: [0, 0, 2, 1]
6 2 3 1
After:  [0, 6, 2, 1]



5 0 2 3
5 1 3 1
...
5 3 2 2

In other words, first a number of groups of three lines that parse into a structure, separated by blank lines, then three blank lines, then a number of lines of four digits.

I have working parsers for each of the structures - Sample and MaskedOperation, with parsers sample and maskedOp respectively¹ - but I can't figure out how to put them together to parse it into a ([Sample], [MaskedOperation]).

I tried the following:

parseInput :: GenParser Char st ([Sample], [MaskedOperation])
parseInput = do
  samples <- sample `sepBy` (count 2 newline) <* count 3 newline
  operations <- maskedOp `sepBy` newline
  return (samples, operations)

but it fails when it reaches the three newlines, expecting another sample:

(line 3221, column 1):
unexpected "\n"
expecting "Before:"

How do I tell parsec that I want to take as many as it can, then consume the separator (the extra newlines), then start reading other stuff?

¹ Read the Advent of Code problem for context; the names are not important.

Solution

I am afraid that you cannot use sepBy here.

Let's simplify it to parsing 'a' as samples and 'b' as newlines. You are going to parse strings like

a b b a b b b c b c b c b

So what does the parser think while traversing on such a string? Let's go through it:

Parsing a or an empty sequence

[a] b b a b b b c b c b c b

Oh, an a. The sequence is non empty, so I will parse many "bb" >> "a" from now.

a [b b a] b b b c b c b c b

Success! Let's get some more or finish

a b b a [b b] b c b c b c b

Okay, I have found another bb, so the sequence continues. Parsing a

a b b a b b [b] c b c b c b

wat

The problem is that parser won't rollback unless explicitly asked to do so. To give parser rollback ability you must mark it with try combinator, otherwise it won't be able to "unconsume" consumed input.

For now I don't see any better way than rewrite the sepBy combinator to make it aware that after parsing each separator it may need to return it back to the buffer if parsing separator >> target fails:

sepTry a sep = sepTry1 a sep <|> pure []
sepTry1 a sep = liftA2 (:) a (many (try $ sep *> a))

Note that parsing a must be included in try section – this is where the failure actually is triggered.

To visualize the difference let's see same scenario but with sepTry instead:

...

a [b b a] b b b c b c b c b

Success! Let's try to get one more if possible

a b b a ![b b] b c b c b c b

Okay, I have found another bb, so the sequence continues. Parsing a

a b b a !b b [b] c b c b c b

Not what I expected. Return failure and move cursor to back exclamation mark.

a b b a ![]b b b c b c b c b

Failed parsing bba, parsing sequence finished. Parse bbb

a b b a [b b b] c b c b c b

Success!

In your case after each Sample this parser will attempt to read 2 newlines with Sample after them, or in case of failure 3 newlines and continue to MaskedOperations