Search code examples
haskellparsec

Parsec parse string `sepBy` endline


I have faced with strange behavior of parsec in case of parsing simple strings

The strings example is:

  1 C1           1.1650     2.7470    -0.1840 ca         1 MOL       0.408200
  2 N1          -0.0550     2.1750    -0.0380 nb         1 MOL      -0.665000
  3 C2          -0.2180     0.8450     0.1920 ca         1 MOL       0.450600
  4 C3          -1.6310     0.3330     0.3310 c3         1 MOL      -0.140700

My parser is

atom = do
    str <- optional spaces *> (many1 $ (letter <|> digit <|> oneOf "-+.")) `sepBy` spaces

    let id = read $ head str :: Int
    let charge = (read.head.reverse) str :: Double    
    return (id,(str !! 1),charge)

records = atom `sepEndBy1` newline  

I need to parse each string with atom parser. If I use only atom parser to a line it works.

But if try to use records parser, it looks like the first atom parser eats the whole string. So I have (1,C1,-0.140700) instead of array [(1,C1,-0.408200),(2,N1,0.665000)] etc.

P.S. I can't understand at all how parsec can traverse a line with \n symbols at all, in this case. For example if we have

onlyForTest = (many1 $ (letter <|> digit <|> oneOf "-+.")) `sepBy` spaces

and test such example:

*Main> parseTest onlyForTest  "bla bl\na bla"

Output:

["bla","bl","a","bla"]

But \n symbol is not a separator in sepBy!


Solution

  • As @freestyle said, spaces uses Data.Char.isSpace to determine if a character should be consumed, and that includes both \n and \r.

    Use oneOf " \t" instead of spaces and endOfLine instead of newline. Oh, and optional is extraneous in optional spaces, because spaces consume zero or more characters.