I am trying to parse a string in Haskell which represents a tree. Each node is on a line, where the indentation determines the nesting (e.g. like the syntax of Python or Haskell).
A successful parsing result should be of the type [Tree]
(a forest of rose trees), where
data Tree = Tree String [Tree]
deriving Show
For example, the string
A
B
C
D
E
F
should result in
[Tree "A" [Tree "B" [], Tree "C" []], Tree "D" [Tree "E" [Tree "F" []]]]
With the Haskell indents package and the help of an article about how to parse indented trees, I could parse examples like this.
The problem is that my current implementation successfully parses the string
A
B
C
D
as
[Tree "A" [Tree "B" [Tree "C" []]], Tree "D" []]
The node D
is parsed as another root. However, I would like to make the parsing fail, since D
is unindented relative to C
, but it does not match the indentation level of A
or B
.
My implementation is as follows (simplifications: tree labels are nonempty and have alphanumeric characters only; not every kind of Unicode newline is supported).
import qualified Control.Applicative as A
import qualified Control.Monad as M
import qualified Text.Parsec as P
import qualified Text.Parsec.Indent as I
parse :: String -> Either P.ParseError [Tree]
parse = I.runIndent "" . P.runParserT forest () ""
forest = P.many tree A.<* P.eof
tree = spacing A.*> I.withBlock Tree node tree
spacing = P.many P.newline A.*> indentation
indentation = P.many $ P.char ' '
node = label A.<* spacing
label = P.many1 P.alphaNum A.<* lineEnd
lineEnd = M.void P.newline P.<|> P.eof
How can this parser be changed to accept only unindents which match some outer indentation level? Thanks in advance.
Simply replacing the definition of forest
by
forest = I.block tree A.<* P.eof
seems to do the trick. I.block
here makes sure that the roots of the source string are indented to the same level.
The example
A
B
C
D
now fails with
(line 4, column 7):
unexpected 'D'
expecting " " or end of input
not indented or indentation doesn't match
The problem with P.many tree
is that a tree
can be arbitrarily indented. As soon as an unindentation of a tree
does not match any outer indentation level (D
in the example), the parser happily grows another top-level tree
(root) of the forest
instead of failing.