I have faced with strange behavior of parsec
in case of parsing simple strings
The strings example is:
1 C1 1.1650 2.7470 -0.1840 ca 1 MOL 0.408200
2 N1 -0.0550 2.1750 -0.0380 nb 1 MOL -0.665000
3 C2 -0.2180 0.8450 0.1920 ca 1 MOL 0.450600
4 C3 -1.6310 0.3330 0.3310 c3 1 MOL -0.140700
My parser is
atom = do
str <- optional spaces *> (many1 $ (letter <|> digit <|> oneOf "-+.")) `sepBy` spaces
let id = read $ head str :: Int
let charge = (read.head.reverse) str :: Double
return (id,(str !! 1),charge)
records = atom `sepEndBy1` newline
I need to parse each string with atom
parser. If I use only atom parser to a line it works.
But if try to use records
parser, it looks like the first atom parser eats the whole string. So I have (1,C1,-0.140700)
instead of array [(1,C1,-0.408200),(2,N1,0.665000)]
etc.
P.S. I can't understand at all how parsec
can traverse a line with \n
symbols at all, in this case. For example if we have
onlyForTest = (many1 $ (letter <|> digit <|> oneOf "-+.")) `sepBy` spaces
and test such example:
*Main> parseTest onlyForTest "bla bl\na bla"
Output:
["bla","bl","a","bla"]
But \n
symbol is not a separator in sepBy
!
As @freestyle said, spaces
uses Data.Char.isSpace
to determine if a character should be consumed, and that includes both \n
and \r
.
Use oneOf " \t"
instead of spaces
and endOfLine
instead of newline
. Oh, and optional
is extraneous in optional spaces
, because spaces
consume zero or more characters.