Search code examples
xmlhaskellhxt

How to edit specific elements in XML file using HXT?


In short, here is what I want to accomplish:

"foo.xml":

<?xml version="1.0"?>
<foo>
  <bar>
    <baz>
      <a>foo</a>
      <a>bar</a>
      <a>baz</a>
    </baz>
  </bar>
</foo>

expected result (contents of "bar.xml"):

<?xml version="1.0"?>
<foo>
  <bar>
    <baz>
      <a>foo!</a>
      <a>bar!</a>
      <a>baz!</a>
    </baz>
  </bar>
</foo>

...my attempt to approach the problem:

module Main (main) where

import Control.Monad

import Control.Arrow.ArrowTree
import Text.XML.HXT.Core

main :: IO ()
main = void . runX $ readDocument [] "foo.xml" >>>
       applic >>> writeDocument [withIndent yes] "bar.xml"

applic :: IOSArrow XmlTree XmlTree
applic = getChildren >>> hasName "foo"
--       ^^ because of extra root node (?)
         /> hasName "bar" /> hasName "baz" /> hasName "a" >>> changeText excl

excl :: String -> String
excl = (++ "!")

Question: How to directly edit just selected elements without changing/removing their root elements? Also note that this program doesn't create "bar.xml" file, so something is definitely wrong. Tracing shows that after applying applic arrow, the document consists of three a elements ("foo", "bar", and "baz"; without exclamation points).


Solution

  • I don't pretend to be good at HXT, I haven't used it much, but I've gotten what you want to do to work through some experimentation. If someone has more experience with HXT feel free to offer a better solution.

    I found through skimming over the HXT wiki the process* functions, like processTopDown and processChildren, along with several others. These seem to be what actually allows change to happen. Now, I'm assuming that your actual use case is more complex, you may only want to select elements at a certain level. The pattern I stumbled upon was to use processChildren along with the HXT version of when, not the Control.Monad one since they are not the same. Basically, my first implementation was

    applic
        = processChildren
        $ flip when (isElem >>> hasName "foo")
            $ processChildren
            $ flip when (isElem >>> hasName "bar")
                $ processChildren
                $ flip when (isElem >>> hasName "baz")
                    $ processChildren
                    $ flip when (isElem >>> hasName "a")
                        $ processChildren
                    $ flip when isText
                        $ changeText excl
    

    And this is really quite ugly to me, there's just way too much repetition. So I abstracted this out to something that is much more readable:

    -- Fixity is important here, must be right-associative.
    infixr 5 />/
    (/>/) :: ArrowXml a => String -> a XmlTree XmlTree -> a XmlTree XmlTree
    name />/ action
        = processChildren
        $ action `when` (isElem >>> hasName name)
    
    applic = "foo" />/ "bar" />/ "baz" />/ "a" />/
        processChildren (
            changeText excl `when` isText
        )
    

    All these calls to processChildren might be redundant, particularly if you're just drilling down into the structure, but it definitely works and it won't modify other a elements in different parts of the file.