Search code examples
htmlparsinghaskellfunctional-programmingparsec

Haskell - Parsec Parsing <p> element


I'm using Text.ParserCombinators.Parsec and Text.XHtml to parse an input like this:

This is the first paragraph example\n
with two lines\n
\n
And this is the second paragraph\n

And my output should be:

<p>This is the first paragraph example\n with two lines\n</p> <p>And this is the second paragraph\n</p>

I defined:


line= do{
        ;t<-manyTill (anyChar) newline
        ;return t
        }

paragraph = do{
        t<-many1 (line) 
        ;return ( p << t )
    }

But it returns:

<p>This is the first paragraph example\n with two lines\n\n And this is the second paragraph\n</p>

What is wrong? Any ideas?

Thanks!


Solution

  • From documentation for manyTill, it runs the first argument zero or more times, so 2 newlines in a row is still valid and your line parser will not fail.

    You're probably looking for something like many1Till (like many1 versus many) but it doesn't seem to exist in the Parsec library, so you may need to roll your own: (warning: I don't have ghc on this machine, so this is completely untested)

    many1Till p end = do
        first <- p
        rest  <- p `manyTill` end
        return (first : rest)
    

    or a terser way:

    many1Till p end = liftM2 (:) p (p `manyTill` end)