I'm using Text.ParserCombinators.Parsec and Text.XHtml to parse an input like this:
This is the first paragraph example\n with two lines\n \n And this is the second paragraph\n
And my output should be:
<p>This is the first paragraph example\n
with two lines\n</p>
<p>And this is the second paragraph\n</p>
I defined:
line= do{
;t<-manyTill (anyChar) newline
;return t
}
paragraph = do{
t<-many1 (line)
;return ( p << t )
}
But it returns:
<p>This is the first paragraph example\n
with two lines\n\n And this is the second paragraph\n</p>
What is wrong? Any ideas?
Thanks!
From documentation for manyTill, it runs the first argument zero or more times, so 2 newlines in a row is still valid and your line
parser will not fail.
You're probably looking for something like many1Till
(like many1
versus many
) but it doesn't seem to exist in the Parsec library, so you may need to roll your own: (warning: I don't have ghc on this machine, so this is completely untested)
many1Till p end = do
first <- p
rest <- p `manyTill` end
return (first : rest)
or a terser way:
many1Till p end = liftM2 (:) p (p `manyTill` end)