Search code examples
haskellparsec

Parsec trouble with end of line


I've got a Parsec parser that I'm writing in mostly applicative style. In one case (the only time I'm using sepBy) I'm having trouble with my eol parser. First, a few definitions:

eol =   try (string "\n\r")
    <|> try (string "\r\n")
    <|> string "\n"
    <|> string "\r"
    <?> "eol"

betaLine = string "BETA " *> (StrandPair <$> p_int <*> p_int <*> p_int <*> (p_int *> p_direction) <*> p_exposure) <* eol

Note: betaLine works perfectly (for brevity, I've left out the definitions of p_int and such:

*HmmPlus> parse betaLine "beta" "BETA 6 11 5 24 -1 oiiio\n"
Right (StrandPair {firstStart = 6, secondStart = 11, pairLength = 5, parallel =   Antiparallel, exposure = [Exposed,Buried,Buried,Buried,Exposed]})

The problem occurs with this other parser, hmmMatchEmissions:

 hmmMatchEmissions     = spaces *> (V.fromList <$> sepBy p_logProb spaces) <* eol <?> "matchEmissions"

*HmmPlus> parse hmmMatchEmissions "me" "      2.61196  4.43481  2.86148  2.75135  3.26990  2.87580  3.69681\n"
Left "me" (line 2, column 1):
unexpected end of input

Now, if I remove the <* eol from the parser definition, and remove the \n from the line, it does work:

*HmmPlus> parse hmmMatchEmissions "me" "      2.61196  4.43481  2.86148  2.75135  3.26990  2.87580  3.69681"
Right (fromList [NonZero 2.61196,NonZero 4.43481,NonZero 2.86148,NonZero 2.75135,NonZero 3.2699,NonZero 2.8758,NonZero 3.69681])

So, why is eol working in the case of betaLine but not hmmMatchEmissions?

I will note that this is the only place I'm using sepBy; could that be a clue?

Update: I've done the following, and it now fails differently :/

reqSpaces = many1 (oneOf " \t")

optSpaces = many (oneOf " \t")

hmmMatchEmissions = optSpaces *> (V.fromList <$> sepBy1 p_logProb reqSpaces) <* eol <?> "matchEmissions"

And here's the failure:

*HmmPlus> parse hmmMatchEmissions "me" "  0.123  0.124\n"
Left "me" (line 1, column 10):
unexpected "0"
expecting eol

I'll note that the unexpected 0 in column 10 is the first character of the 0.124 token.


Solution

  • The problem seems to be that your p_logProb parser consumes whitespace. So, this is what happens during the parse:

      0.123  0.124\n
    []               optSpaces
      [-----]        p_logProb
             {       trying reqSpaces
             {       trying eol
                     failure: expecting eol
    

    The p_logProb parser should only consume the thing that it parses, namely the actual number. This will lead to the intended parse:

      0.123  0.124\n
    []               optSpaces
      [---]          p_logProb
           []        reqSpaces
             [---]   p_logProb
                  {  trying reqSpaces
                  #  eol