I've got a Parsec parser that I'm writing in mostly applicative style. In one case (the only time I'm using sepBy
) I'm having trouble with my eol
parser. First, a few definitions:
eol = try (string "\n\r")
<|> try (string "\r\n")
<|> string "\n"
<|> string "\r"
<?> "eol"
betaLine = string "BETA " *> (StrandPair <$> p_int <*> p_int <*> p_int <*> (p_int *> p_direction) <*> p_exposure) <* eol
Note: betaLine
works perfectly (for brevity, I've left out the definitions of p_int
and such:
*HmmPlus> parse betaLine "beta" "BETA 6 11 5 24 -1 oiiio\n"
Right (StrandPair {firstStart = 6, secondStart = 11, pairLength = 5, parallel = Antiparallel, exposure = [Exposed,Buried,Buried,Buried,Exposed]})
The problem occurs with this other parser, hmmMatchEmissions
:
hmmMatchEmissions = spaces *> (V.fromList <$> sepBy p_logProb spaces) <* eol <?> "matchEmissions"
*HmmPlus> parse hmmMatchEmissions "me" " 2.61196 4.43481 2.86148 2.75135 3.26990 2.87580 3.69681\n"
Left "me" (line 2, column 1):
unexpected end of input
Now, if I remove the <* eol
from the parser definition, and remove the \n
from the line, it does work:
*HmmPlus> parse hmmMatchEmissions "me" " 2.61196 4.43481 2.86148 2.75135 3.26990 2.87580 3.69681"
Right (fromList [NonZero 2.61196,NonZero 4.43481,NonZero 2.86148,NonZero 2.75135,NonZero 3.2699,NonZero 2.8758,NonZero 3.69681])
So, why is eol
working in the case of betaLine
but not hmmMatchEmissions
?
I will note that this is the only place I'm using sepBy
; could that be a clue?
Update: I've done the following, and it now fails differently :/
reqSpaces = many1 (oneOf " \t")
optSpaces = many (oneOf " \t")
hmmMatchEmissions = optSpaces *> (V.fromList <$> sepBy1 p_logProb reqSpaces) <* eol <?> "matchEmissions"
And here's the failure:
*HmmPlus> parse hmmMatchEmissions "me" " 0.123 0.124\n"
Left "me" (line 1, column 10):
unexpected "0"
expecting eol
I'll note that the unexpected 0
in column 10 is the first character of the 0.124
token.
The problem seems to be that your p_logProb
parser consumes whitespace. So, this is what happens during the parse:
0.123 0.124\n
[] optSpaces
[-----] p_logProb
{ trying reqSpaces
{ trying eol
failure: expecting eol
The p_logProb
parser should only consume the thing that it parses, namely the actual number. This will lead to the intended parse:
0.123 0.124\n
[] optSpaces
[---] p_logProb
[] reqSpaces
[---] p_logProb
{ trying reqSpaces
# eol