Search code examples
xmlhaskellhxt

HXT: how to “lift” children of certain elements?


Suppose I have this MathML document

<?xml version="1.0" encoding="UTF-8”?>
<math xmlns="http://www.w3.org/1998/Math/MathML">
    <mi> f </mi> 
    <mo> &ApplyFunction; </mo> 
    <mrow> 
      <mo> ( </mo> 
      <mi> x </mi> 
      <mo> ) </mo> 
    </mrow> 
</math>

Suppose I want to “lift” the children of mi and mrow, I should get (let’s ignore the whitespace change here)

<?xml version="1.0" encoding="UTF-8"?>
<math xmlns="http://www.w3.org/1998/Math/MathML">
    f
    <mo> &ApplyFunction; </mo> 
    <mo> ( </mo> 
    x
    <mo> ) </mo> 
</math>

How should I write that with HXT?

I’m a Haskell newbie... so all I have right now is

-- Dealing with command line arguments and stuff…
processRootElement :: IOSArrow XmlTree XmlTree
processRootElement
    = processTopDown -- What goes here?

Solution

  • Quick type rundown

    XmlTree = NTree XNode
    

    meaning that every XmlTree has the construction

    NTree XNode [XMLTree]
    

    Where the first argument is the current node, and the second argument is the list of children.

    The creative process

    processTopDown will take the tree transformation you provide and produce a tree transformation which applies it recursively.

    First, let's define the tree transformation you want on a single node:

    1. Go through the children of the current node
    2. If any match the tags we specify, then
      1. Take all the children of the tag, and
      2. Make them children of the current node instead
      3. Then remove the tag

    The transformation doesn't "lift" the children of current node, because that wouldn't be possible on the root.

    A good way to do this would be to use processChildren, which is an arrow which lets us specify new children for the current node based on the old children. To do this, we'll need to use conditional arrows

    We can then split the design into two parts, a predicate for matching the tags we want, and the transformation we want to perform

    The predicate

    Reminding ourselves of what forms an XNode can take, the one we're interested in is

    XTag QName XmlTrees
    

    We want to match on nodes of this form, for our given tag names. For this, we write a helper function

    filterOnQName :: QName -> XNode -> Bool
    filterOnQName qname (XTag xqname _) 
      | qname == xqname = True
      | otherwise       = False
    filterOnQName _ _   = False
    

    For ease of use, we want to write the tags as strings, so we'll use mkName to convert them into QNames. Then our more useful filter function is

    filterTags :: [String] -> XmlTree -> Bool
    filterTags tagNames (NTree xnode _) = any (\qname -> filterOnQName qname xnode) (map mkName tagNames)
    

    But this isn't an arrow, which we need it to be for reasons we'll see later. We can simply turn it into one with isA

    childFilter tags = isA (filterTags tags)
    

    The transformation

    We're going two need two arrows for the transformation body - one for when the filter matches, and one for when it doesn't.

    For when it doesn't, the transformation is easy - we want to keep the current node.

    filterFailed = this
    

    Here, this is the identity arrow - it does nothing.

    When the filter matches, we want to get the children - first, let's write a helper

    getChildren :: XmlTree -> [XmlTree]
    getChildren (NTree _ children) = children
    

    Handily, because we're working with list arrows, we can turn this straight into an arrow using arrL

    liftChildren = arrL getChildren
    

    Combining them

    We can now turn this into a single arrow, using ifA, the arrow version of if

    liftMatchedChildren tags = ifA (childFilter tags) liftChildren filterFailed
    

    And finally, we can describe the transformation we wanted

    processRootElement
      = processTopDown (processChildren (liftMatchedChildren ["mi", "mrow"]))