Consider XML input like this:
<root>
<sub>
<p att1=0 att2=1><i>foo</i></p>
<p att1=1 att2=1><i>bar</i></p>
<p att1=0 att2=0><i>baz</i></p>
<p att1=0 att2=1><i>bazz</i></p>
</sub>
</root>
Which should be transformed to:
<root>
<sub>
<p att1=0 att2=1><i>foo</i><i>bazz</i></p>
<p att1=1 att2=1><i>bar</i></p>
<p att1=0 att2=0><i>baz</i></p>
</sub>
</root>
(Because both p
parent elemets of <i>foo</i>
and <i>bazz</i>
are siblings and have the same attributes.)
How to do such a transformation with HXT arrows?
Ok, there is my try: The code first collects all attribute lists at the parent of the siblings and then does a merge for all attribute lists that are different:
{-# LANGUAGE Arrows #-}
module Main
where
import Data.List
import Text.XML.HXT.Core
example="\
\<root>\
\ <sub>\
\ <p att1=\"0\" att2=\"1\"><i>foo</i></p>\
\ <p att1=\"1\" att2=\"1\"><i>bar</i></p>\
\ <p att1=\"0\" att2=\"0\"><i>baz</i></p>\
\ <p att1=\"0\" att2=\"1\"><i>bazz</i></p>\
\ </sub>\
\</root>"
get_attrs name = getChildren >>> hasName name >>> proc x -> do
a <- listA (((
getAttrName
&&& (getChildren >>> getText)) ) <<< getAttrl ) -< x
returnA -< a
has_attrs atts = proc x -> do
a <- listA (((
getAttrName
&&& (getChildren >>> getText)) ) <<< getAttrl ) -< x
if (a == atts)
then returnA -< x
else none -<< ()
mk_attrs atts = map f atts
where
f (n, v) = sqattr n v
mergeSiblings_one inp name att = catA (map constA inp)
>>> mkelem name
(mk_attrs att)
[getChildren
>>> hasName name >>> has_attrs att >>> getChildren ]
mergeSiblings_core name = proc x -> do
a <- listA (get_attrs name >>. (sort.nub) ) -< x
b <- listA this -< x
c <- listA (getChildren >>> neg (hasName name)) -< x
catA ((map (mergeSiblings_one b name) a) ++ (map constA c) ) -<< ()
is_parent_of name = getChildren >>> hasName name
mergeSiblings name = processTopDownUntil (
is_parent_of name `guards` mergeSiblings_core name
)
stuff = mergeSiblings "p"
main :: IO ()
main
= do
x <- runX (
configSysVars [withTrace 1]
>>> readString [withValidate no
,withPreserveComment yes
,withRemoveWS yes
] example
>>> setTraceLevel 4
>>> stuff >>> traceTree >>> traceSource
)
return ()
<root>
<p att1="0" att2="0">
<i>baz</i>
</p>
<p att1="0" att2="1">
<i>foo</i>
<i>bazz</i>
</p>
<p att1="1" att2="1">
<i>bar</i>
</p>
</root>
The above version puts the merged children in front and the none-matching ones in the new children list of the parent node: A nice variation would be: insert each merged child at the old position of the first old sibling node and don't change the order of non-merged nodes. For example that
<other>1</other><p><a/></p><other>2</other><p><b/></p>
is transformed to
<other>1</other><p><a/><b/></p><other>2</other>
and not to:
<p><a/><b/></p><other>1</other><other>2</other>
Since I am new to HXT and arrows - I wouldn't be surprised if there are more concise/HXT0idiomatic/elegant answers.