Search code examples
xmlhaskellwhitespacehxt

Copy an XML file preserving closing tags of some elements


I'm hacking a thing that should copy an XML file and edit little part of it. Editing now is OK, but interestingly enough copying may be quite tricky. This is essentially «reverse engineering» work and now I know that I should somehow preserve closing tags of some elements (even if the elements contain only white space or are empty). The problem is when HXT reads something like

<tag>
</tag>

it then prints it as

<tag/>

I can tell it to always use explicit closing tag (or whatever you call it) specifying withOutputXHTML option for writeDocument function, however there are elements that are written as

<tag/>

that should be copied «as is».

So, essentially my problem boils down to: «How to copy this file preserving closing tags of some specific elements?»:

<foo>
  <bar>
  </bar>
  <baz/>
</foo>

Simple copying program for reference/experiment:

module Main (main) where

import Control.Monad (void)
import Text.XML.HXT.Core

main :: IO ()
main = void $ runX $
       readDocument [ withValidate no ] "test.xml" >>>
       writeDocument [ withIndent yes
                     , withOutputEncoding isoLatin1
                     , withOutputXHTML ] "result.xml"

Solution

  • After long, frustrating searching, I've decided to try every option in Text.XML.HXT.Arrow.XmlState. Some options just have no doc strings, so it's a guessing game.

    Finally, I've found this marvel:

    withNoEmptyElemFor :: [String] -> SysConfig

    Although it has no doc string, its name sounds quite promising. Indeed, with help of this option we can specify names of elements which «cannot be empty».

    This option can be used with writeDocument or configSysVars. I like the second arrow better because I can use it locally, it's useful if you have several arrows that perform processing of slightly different documents that may have different collections of tags that shouldn't be empty (that's my case).

    So, returning to my example, we can fix it by writing:

    module Main (main) where
    
    import Control.Monad (void)
    import Text.XML.HXT.Core
    
    main :: IO ()
    main = void $ runX $
           readDocument [ withValidate no ] "test.xml" >>>
           writeDocument [ withIndent yes
                         , withOutputEncoding isoLatin1
                         , withNoEmptyElemFor ["bar"] ] "result.xml"