Search code examples
xmlhaskellrefactoringhxt

Manipulating XML with HXT


From what I can tell, normal HXT seems to be more geared toward queries on XML moreso than XML AST refactoring. However, one of the HXT modules, Data.Tree.NTree.Zippers.TypeDefs seems to have some machinery for diving into a document and doing local work, instead of the more global arrows. However, I can't seem to get anything to work. This is a follow-up post to my earlier HXT issue - all the code is the same, except now trans is replacing this.

Here is the entry point to my program:

start :: App -> IO [XmlTree]
start (App src dest) = runX $
                         readDocument [
                                    --... some settings ...
                                      ]   
                                      src
                         >>>
                         trans
                         >>> 
                         writeDocument [
                                     --... some settings ...
                                       ]
                                       dest

And here is the module where trans is defined:

module Main.Internal where

import Data.Maybe (fromJust)

import Text.XML.HXT.DOM.XmlNode (mkText')
import Text.XML.HXT.Core hiding (addToTheRight)
import Data.Tree.NTree.Zipper.TypeDefs

trans :: IOSLA (XIOState s) XmlTree XmlTree
trans = arrL go
  where
    go :: XmlTree -> [XmlTree]
    go x = [fromNTZipper . manip . toNTZipper $ x]

unList :: [a] -> a
unList []    = error "dun goofed!"
unList (x:_) = x

manip = fromJust . (addToTheRight $ mkText' "bar")
      . fromJust . down

Finally, here is my input file:

<html>
  <head>
    <title>foo</title>
  </head>
  <body>
    <h1>foo</h1>
  </body>
</html>

and my output:

<?xml version="1.0" encoding="US-ASCII"?>
<html>
  <head>
    <title>foo</title>
  </head>
  <body>
    <h1>foo</h1>
  </body>
</html>

So, why isn't "bar" anywhere to be found in my output? Shouldn't it appear right after </html>? Any help would, again, be wonderful :)


Solution

  • Your ideas seem to be solid and I'm not sure where you're wrong, but playing around with it I was able to generate the test code:

    import Data.Tree.NTree.Zipper.TypeDefs
    import Text.XML.HXT.Parser.HtmlParsec
    import Text.XML.HXT.DOM.XmlNode
    import Text.XML.HXT.DOM.TypeDefs
    import Data.Tree.NTree.TypeDefs
    import Control.Arrow.IOListArrow
    import Text.XML.HXT.Arrow.WriteDocument
    str = "<html>\n  <head>\n    <title>foo</title>\n  </head>\n  <body>\n    <h1>foo</h1>\n  </body>\n</html>"
    
    fromJust (Just x) = x
    
    manip :: NTree XNode -> NTree XNode
    manip x = fromNTZipper $ fromJust $
            down (toNTZipper x) >>= addToTheLeft (mkText "Boo!") >>= up
    
    stringify = runIOLA $ writeDocumentToString []
    
    main = do
        xs <- mapM stringify $ map manip $ parseHtmlDocument "" str
        putStrLn (show xs)
    

    which outputs [["\n Boo!<head>\n <title>foo</title>\n </head>\n <body>\n <h1>foo</h1>\n </body>\n"]]. I'm not actually sure what happened to the <html> element but addToTheLeft does exactly what it says it does. (I am using the >>= for the Maybe monad above).

    I don't know if it's trans or >>> above that, but the manip you're doing seems like it should work.

    EDIT: Notice that a lot of what I've written above avoids the idiomatic arrows which are central to HXT, which is probably why I'm getting some strange results. It looks from the package structure that importing Text.XML.HXT.Core is enough for generally reading strings and documents. The following works for me:

    Prelude> let file = "<html>\n  <head>\n    <title>foo</title>\n  </head>\n  <body>\n    <h1>foo</h1>\n  </body>\n</html>"
    Prelude> :m +Text.XML.HXT.Core
    Prelude Text.XML.HXT.Core> let apply (arrows) str = head $ runLA (xshow $ hread >>> arrows) str
    Prelude Text.XML.HXT.Core> :t apply
    apply :: LA XmlTree XmlTree -> String -> String
    Prelude Text.XML.HXT.Core> putStrLn $ apply (withNav $ moveDown >>> addToTheLeft (txt "bar") >>> moveUp) file
    <html>bar
      <head>
        <title>foo</title>
      </head>
      <body>
        <h1>foo</h1>
      </body>
    </html>
    

    so those are the relevant functions. Notice that HXT already seems to do its Maybe stuff by clobbering XML trees in the list (of the LA list arrow) which do not satisfy a given predicate.