Search code examples
xmlhaskelltreearrowsdo-notation

Haskell: arrows on trees, XML, and Hxt: Transform text leaves into subtrees


Background

The Freeplane app seems to have died its decades-long death. I'm extracting my data from it. Freeplane stores data as XML. One of the first steps I'm taking is to homogenize that format.

Question, and what I wanted to do

My goal is to turn every XText item in an .xml-formatted file into an XTag called "node" with an attribute called "TEXT". I've got it done, but in a manner that seems inelegant.

I wanted to do this:

do 
  t <- getText
  eelem "node" >>> addAttr "TEXT" t

But when I tried, I was informed that there is no monad instance for the IOSLA arrow, hence do-notation is not available.

Is something like that possible?

What I did instead

I dug into the raw, deeply recursive XmlTree data type, and ended up with:

module Ask where

import Control.Category hiding ((.), id)
import Control.Arrow
import Text.XML.HXT.Core
import Data.Tree.NTree.TypeDefs


textToNode :: IOSArrow XmlTree XmlTree
textToNode = arr f where
  f :: XmlTree -> XmlTree
  f (NTree (XText s) children) =
    -- `children` is always an empty subtree for Text items
    let attrs = [ NTree
                  (XAttr $ mkName "TEXT")
                  [NTree (XText s) []] ]
    in NTree (XTag (mkName "node") attrs) children
  f x = x

go :: IO [XmlTree]
go =
  runX $
  readDocument [withValidate no] "flat.xml"
  >>> deepest (ifA isText textToNode none)
  >>> putXmlTree "-"

To see it in action, make a file called "flat.xml" containing:

<doc>
<a>1</a>
<b>2</b>
</doc>

When you run go, you'll get back the "1" and the "2", but inserted into XTags like this:

---XTag "node"
   |   "TEXT"="1"

(You'll also get back some whitespace; it's not important.)


Solution

  • Arrow doesn't have do notation, but it does have something similar, called proc notation or Arrow syntax. See, in particular, this question on the relation between these two notations. With this, you can write in a similar style to what you wanted to do:

    textToNode :: IOSArrow XmlTree XmlTree
    textToNode = proc x -> do
      text <- getText -< x
      node <- eelem "node" -<< text
      tree <- addAttr "TEXT" text -<< node
      returnA -< tree
    

    Here, <- is analogous to its monadic counterpart, and -<< val acts like -< returnA val and allows you to chain arrows. Note that you will have to enable a GHC language extension:

    {-# LANGUAGE Arrows #-}