I want to write a simple parser for a subset of Jade, generating some XmlHtml for further processing.
The parser is quite simple, but as often with Parsec, a bit long. Since I don't know if I am allowed to make such long code posts, I have the full working example here.
I've dabbled with Parsec before, but rarely successfully. Right now, I don't quite understand why it seems to swallow following lines. For example, the jade input of
.foo.bar
| Foo
| Bar
| Baz
tested with parseTest tag txt
, returns this:
Element {elementTag = "div", elementAttrs = [("class","foo bar")], elementChildren = [TextNode "Foo"]}
My parser seems to be able to deal with any kind of nesting, but never more than one line. What did I miss?
If Parsec cannot match the remaining input, it will stop parsing at that point and simply ignore that input. Here, the problem is that after having parsed a tag, you don't consume the whitespace in the beginning of the line before the next tag, so Parsec cannot parse the remaining input and bails. (There might also be other issues, I can't test the code right now)
There are many ways of adding something that consumes the spaces, but I am not familiar with Jade so I cannot tell you which way is the "correct" way (I don't know how the indentation syntax works) but just adding whiteSpace
somewhere at the end of tag
should do it.
By the way, you should consider splitting up your parser into a Lexer and Parser. The Lexer produces a token stream like [Ident "bind", OpenParen, Ident "tag", Equals, StringLiteral "longname", ..., Indentation 1, ...]
and the parser parses that token stream (Yes, Parsec can parse lists of anything). I think that it would make your job easier/less confusing.