I'm using Advent of Code part 16 as an excuse to learn how to use Parsec, but I'm stumbling on how to handle this specific case.
The input is on the following format:
Before: [3, 2, 3, 0]
2 3 1 1
After: [3, 2, 3, 0]
Before: [1, 0, 2, 1]
7 0 1 1
After: [1, 1, 2, 1]
...
Before: [0, 0, 2, 1]
6 2 3 1
After: [0, 6, 2, 1]
5 0 2 3
5 1 3 1
...
5 3 2 2
In other words, first a number of groups of three lines that parse into a structure, separated by blank lines, then three blank lines, then a number of lines of four digits.
I have working parsers for each of the structures - Sample
and MaskedOperation
, with parsers sample
and maskedOp
respectively1 - but I can't figure out how to put them together to parse it into a ([Sample], [MaskedOperation])
.
I tried the following:
parseInput :: GenParser Char st ([Sample], [MaskedOperation])
parseInput = do
samples <- sample `sepBy` (count 2 newline) <* count 3 newline
operations <- maskedOp `sepBy` newline
return (samples, operations)
but it fails when it reaches the three newlines, expecting another sample:
(line 3221, column 1):
unexpected "\n"
expecting "Before:"
How do I tell parsec that I want to take as many as it can, then consume the separator (the extra newlines), then start reading other stuff?
1 Read the Advent of Code problem for context; the names are not important.
I am afraid that you cannot use sepBy
here.
Let's simplify it to parsing 'a' as samples and 'b' as newlines. You are going to parse strings like
a b b a b b b c b c b c b
So what does the parser think while traversing on such a string? Let's go through it:
Parsing
a
or an empty sequence
[a] b b a b b b c b c b c b
Oh, an
a
. The sequence is non empty, so I will parsemany
"bb" >> "a"
from now.
a [b b a] b b b c b c b c b
Success! Let's get some more or finish
a b b a [b b] b c b c b c b
Okay, I have found another
bb
, so the sequence continues. Parsinga
a b b a b b [b] c b c b c b
wat
The problem is that parser won't rollback unless explicitly asked to do so.
To give parser rollback ability you must mark it with try
combinator, otherwise it won't be able to "unconsume" consumed input.
For now I don't see any better way than rewrite the sepBy
combinator to make it aware that after parsing each separator it may need to return it back to the buffer if parsing separator >> target
fails:
sepTry a sep = sepTry1 a sep <|> pure []
sepTry1 a sep = liftA2 (:) a (many (try $ sep *> a))
Note that parsing a
must be included in try
section – this is where the failure actually is triggered.
To visualize the difference let's see same scenario but with sepTry
instead:
...
a [b b a] b b b c b c b c b
Success! Let's try to get one more if possible
a b b a ![b b] b c b c b c b
Okay, I have found another
bb
, so the sequence continues. Parsinga
a b b a !b b [b] c b c b c b
Not what I expected. Return failure and move cursor to back exclamation mark.
a b b a ![]b b b c b c b c b
Failed parsing
bba
, parsing sequence finished. Parsebbb
a b b a [b b b] c b c b c b
Success!
In your case after each Sample
this parser will attempt to read 2 newlines with Sample
after them, or in case of failure 3 newlines and continue to MaskedOperation
s