Search code examples
xmlhaskellhxt

Extract multiple elements in an Arrow proc context


I want to parse the following sample XML file, without the pickler module.

<?xml version="1.0" encoding="utf-8" ?>
<Groups> 
    <Name>ABC</Name>
    <GroupA>
        <Name>Foo</Name>
        <Sum>100</Sum>
    </GroupA>
    <GroupB>
        <Name>Bar</Name>
        <Sum>0</Sum>
    </GroupB>
</Groups>

I ended up with this:

{-# language Arrows #-}

import Text.XML.HXT.Core

data Groups = Groups GroupA GroupB deriving Show
data GroupA = GroupA String String deriving Show
data GroupB = GroupB String String deriving Show


readGroup :: LA XmlTree Groups
readGroup = deep (isElem >>> hasName "Groups") >>> getChildren >>>
  proc root -> do
    a <- readGroupA -< root
    b <- readGroupB -< root
    returnA -< Groups a b

readGroupA :: LA XmlTree GroupA
readGroupA = isElem >>> hasName "GroupA" >>> getChildren >>>
  proc root -> do
    n <- isElem >>> hasName "Name" /> getText -< root
    s <- isElem >>> hasName "Sum"  /> getText -< root
    returnA -< GroupA n s

readGroupB :: LA XmlTree GroupB
readGroupB = isElem >>> hasName "GroupB" >>> getChildren >>>
  proc root -> do
    n <- isElem >>> hasName "Name" /> getText -< root
    s <- isElem >>> hasName "Sum"  /> getText -< root
    returnA -< GroupB n s

Unfortunately, this does not work. If I try to extract just a single element in a proc context it works. But trying to extract multiple elements will always fail\ return the empty list. I might have a misunderstanding of the composition >>>.

I run the example with runLa (xreadDoc >>> readGroups)


Solution

  • Try this:

    readGroup :: LA XmlTree Groups
    readGroup = deep (isElem >>> hasName "Groups") >>>
      proc root -> do
        a <- getChildren >>> readGroupA -< root
        b <- getChildren >>> readGroupB -< root
        returnA -< Groups a b
    
    readGroupA :: LA XmlTree GroupA
    readGroupA = isElem >>> hasName "GroupA" >>>
      proc root -> do
        n <- getChildren >>> isElem >>> hasName "Name" /> getText -< root
        s <- getChildren >>> isElem >>> hasName "Sum"  /> getText -< root
        returnA -< GroupA n s
    
    readGroupB :: LA XmlTree GroupB
    readGroupB = isElem >>> hasName "GroupB" >>>
      proc root -> do
        n <- getChildren >>> isElem >>> hasName "Name" /> getText -< root
        s <- getChildren >>> isElem >>> hasName "Sum"  /> getText -< root
        returnA -< GroupB n s
    

    When the call to getChildren is outside the do-block, you are committing to one child before even entering the proc. Inside the proc, you check (for instance) whether that child has name Name and name Sum. Unsurprisingly, you don't find any child fitting these contradictory requirements.

    By moving getChildren inside, you allow different children to be traversed for (for instance) n and s.