Search code examples
parsinghaskellparsec

parsec using between to parse parens


If I wanted to parse a string with multiple parenthesized groups into a list of strings holding each group, for example

"((a b c) a b c)"

into

["((a b c) a b c)","( a b c)"]

How would I do that using parsec? The use of between looks nice but it does not seem possible to separate with a beginning and end value.


Solution

  • While jozefg's solution is almost identical to what I came up with (and I completely agree to all his suggestions), there are some small differences that made me think that I should post a second answer:

    1. Due to the expected result of the initial example, it is not necessary to treat space-separated parts as individual subtrees.
    2. Further it might be interesting to see the part that actually computes the expected results (i.e., a list of strings).

    So here is my version. As already suggested by jozefg, split the task into several sub-tasks. Those are:

    1. Parse a string into an algebraic data type representing some kind of tree.
    2. Collect the (desired) subtrees of this tree.
    3. Turn trees into strings.

    Concerning 1, we first need a tree data type

    import Text.Parsec
    import Text.Parsec.String
    import Control.Applicative ((<$>))
    
    data Tree = Leaf String | Node [Tree]
    

    and then a function that can parse strings into values of this type.

    parseTree :: Parser Tree
    parseTree = node <|> leaf
      where
        node = Node <$> between (char '(') (char ')') (many parseTree)
        leaf = Leaf <$> many1 (noneOf "()")
    

    In my version I do consider the hole string between parenthesis as a Leaf node (i.e., I do not split at white-spaces).

    Now we need to collect the subtrees of a tree we are interested in:

    nodes :: Tree -> [Tree]
    nodes (Leaf _) = []
    nodes t@(Node ts) = t : concatMap nodes ts
    

    Finally, a Show-instance for Trees allows us to turn them into strings.

    instance Show Tree where
      showsPrec d (Leaf x) = showString x
      showsPrec d (Node xs) = showString "(" . showList xs . showString ")"
        where
          showList [] = id
          showList (x:xs) = shows x . showList xs
    

    Then the original task can be solved, e.g., by:

    parseGroups :: Parser [String]
    parseGroups = map show . nodes <$> parseTree
    
    > parseTest parseGroups "((a b c) a b c)"
    ["((a b c) a b c)","(a b c)"]