Search code examples
xmlhaskellhxt

Stubbornness of += operator


As we have learned from my previous question, += is the operator that allows to add one element at a time. Is it possible to «detect» previously added elements and control how future additions are done?

Here is a simple program to start our investigation:

module Main (main) where

import Control.Monad (void)
import Text.XML.HXT.Core

main :: IO ()
main = void $ runX $ root [] [foo]
       >>> writeDocument [withIndent yes] "test.xml"

foo :: ArrowXml a => a XmlTree XmlTree
foo = eelem "foo" += bar += bar += bar -- += is left associative

bar :: ArrowXml a => a XmlTree XmlTree
bar = ifA (deep (hasName "bar")) (eelem "baz") (eelem "bar")

Here, foo creates ‘foo’ node and its contents. Contents are generated with bar arrow, that's supposed to be smart enough to detect previously added ‘bar’ element, and change its behavior. Here we use deep for simplicity: if ‘bar’ element is child of ‘foo’ element, it should be detected, no matter how deep it is (getChildren >>> hasName "bar" should do the trick too).

So, expected contents of test.xml file are:

<?xml version="1.0" encoding="UTF-8"?>
<foo>
  <bar/>
  <baz/>
  <baz/>
</foo>

It doesn't work, of course. Here is what I get:

<?xml version="1.0" encoding="UTF-8"?>
<foo>
  <bar/>
  <bar/>
  <bar/>
</foo>

My questions:

  1. Why ‘bar’ element cannot be detected by bar arrow?

  2. How to detect it?


Solution

  • This is one of those cases where type signatures really help. Stare for a second at the type signature:

    (+=) :: (ArrowXml a) => a b XmlTree -> a b XmlTree -> a b XmlTree 
    

    First off, ArrowXml is a subclass of Arrow, which describes some sort of machinery taking some input to some output. You can think of it like a big factory with conveyor belts taking things to different machines, and we're building these factory machines and hence factories with functions. Three of the Arrow combinators, for example, are:

    (&&&) :: (Arrow a) => a b c -> a b c' -> a b (c, c') |infixr 3|
      Fanout: send the input to both argument arrows and combine their output.
    
    arr :: (Arrow a) => (b -> c) -> a b c
      Lift a function to an arrow.
    
    (.) :: (Category cat) => cat b c -> cat a b -> cat a c
      morphism composition.
    

    Now look at the lowercase letter (type variable) very closely in:

    (+=) :: (ArrowXml a) => a b XmlTree -> a b XmlTree -> a b XmlTree 
    

    Clearly we're taking two machines which turn bs into XmlTrees and "merging them together" into one machine which takes a b in and expels an XmlTree. But importantly, this type signature tells us that more or less the only way that this can be implemented is:

    arr1 += arr2 = arr f . (arr1 &&& arr2) where
        f :: (XmlTree, XmlTree) -> XmlTree
        f = _
    

    This is because of "free theorems"; if you don't know the type of a parameter then we can prove that you can't really do much with it. (It can be a little more complicated because the arrow might have structures which aren't totally encapsulated by arr, like internal counters which are summed together with . which are then set to 0 when using arr. So actually replace arr f with a generic a (XmlTree, XmlTree) XmlTree and you're good to go.)

    So we must have parallel execution of the two arrows here. That's what I'm trying to say. Because the combinator (+=) doesn't know what b is, it has no choice but to blithely feed the b to the arrows in parallel and then try to combine their outputs together. So, deep (hasName "bar") does not look at foo.

    You can possibly make a mutually recursive solution with deep (hasName "bar") . foo if you really want, but it seems potentially dangerous (i.e. infinite loop), so it may be safer to simply define something like:

    a ++= b = a += (b . a)
    

    where the "current" a is "fed" to b to produce the update. To do this you will have to import . from Control.Category as it is not the same as Prelude.. (which does function composition only). This looks like:

    import Prelude hiding ((.))
    import Control.Category ((.))