Search code examples
xmlhaskellxml-parsingarrowshxt

HXT: Can an input change with the arrow syntax?


With the following code

{-# LANGUAGE Arrows #-}
{-# LANGUAGE NoMonomorphismRestriction #-}
import Text.XML.HXT.Core

parseXml :: IOSArrow XmlTree XmlTree
parseXml = getChildren >>> getChildren >>>
  proc x -> do
    y <- x >- hasName "item"
    returnA -< x

main :: IO ()
main = do
    person <- runX (readString [withValidate no]
                    "<xml><item>John</item><item2>Smith</item2></xml>"
                    >>> parseXml)
    putStrLn $ show person
    return ()

I get the output

[NTree (XTag "item" []) [NTree (XText "John") []]]

So it seems that hasName "item" was applied to x which I did not expect. Using arrowp I get for parseXml:

parseXml
   = getChildren >>> getChildren >>>
      (arr (\ x -> (x, x)) >>>
         (first (hasName "item") >>> arr (\ (y, x) -> x)))

So I have the arrow diagram

                                                       y
                                   /-- hasName "item" ---
                               x  /                       
-- getChildren -- getChildren ---\x->(x,x)              \(y,x)->x --- final result
                                  \                       / 
                                   \---------------------/  

Why is hasName "item" also applied to second place of the tuple? I thought there is no state in haskell and hasName "item" x returns a new object instead of changing the internal state of x.

Related question: Is factoring an arrow out of arrow do notation a valid transformation?

My original problem

I have the following code:

{-# LANGUAGE Arrows #-}
import Text.XML.HXT.Core

data Person = Person { forname :: String, surname :: String } deriving (Show)

parseXml :: IOSArrow XmlTree Person
parseXml = proc x -> do
    forname <- x >- this /> this /> hasName "fn" /> getText
    surname <- x >- this /> this /> hasName "sn" /> getText
    returnA -< Person forname surname

main :: IO ()
main = do
    person <- runX (readString [withValidate no]
                               "<p><fn>John</fn><sn>Smith</sn></p>"
                    >>> parseXml)
    putStrLn $ show person
    return ()

If I run it everything works fine and I get the output

[Person {forname = "John", surname = "Smith"}]

But if I change the parseXml to avoid the this statements

parseXml :: IOSArrow XmlTree Person
parseXml = (getChildren >>> getChildren) >>> proc x -> do
    forname <- x >- hasName "fn" /> getText
    surname <- x >- hasName "sn" /> getText
    returnA -< Person forname surname

no person can be parsed anymore (output is []). Investigating the problem with

parseXml :: IOSArrow XmlTree Person
parseXml = (getChildren >>> getChildren) >>>
  proc x -> do
    forname <- x >- withTraceLevel 5 traceTree >>> hasName "fn" /> getText
    surname <- x >- hasName "sn" /> getText
    returnA -< Person forname surname

I got the output

content of: 
============

---XTag "fn"
   |
   +---XText "John"



content of: 
============

---XTag "sn"
   |
   +---XText "Smith"


[]

So everything seems fine, but with the code

parseXml :: IOSArrow XmlTree Person
parseXml = (getChildren >>> getChildren) >>>
  proc x -> do
    forname <- x >- hasName "fn" /> getText
    surname <- x >- withTraceLevel 5 traceTree >>> hasName "sn" /> getText
    returnA -< Person forname surname

I got

content of: 
============

---XTag "fn"
   |
   +---XText "John"


[]

So it seems to me, that the value of the input x changes between the two statements. It looks like the hasName "fn" was applied to x before it was attached to the surname arrow. But shall x not remain the same between the two lines?


Solution

  • No, the input can't change and it doesn't.

    What you've programmed in the lines

    proc x -> do
      y <- x >- hasName "item"
      returnA -< x
    

    is just a filter removing all nodes not named item. His is equivalent to the arrow

    hasName "item" `guards` this
    

    You can test this with

    {-# LANGUAGE Arrows #-}
    {-# LANGUAGE NoMonomorphismRestriction #-}
    
    module Main where
    
    import Text.XML.HXT.Core
    
    parseXml0 :: IOSArrow XmlTree XmlTree
    parseXml0 = getChildren >>> getChildren >>>
      proc x -> do
        _ <- hasName "item" -< x
        returnA -< x
    
    parseXml1 :: IOSArrow XmlTree XmlTree
    parseXml1 = getChildren >>> getChildren >>>
                (hasName "item" `guards` this)
    
    main1 :: Show c => IOSArrow XmlTree c -> IO ()
    main1 parseXml = do
        person <- runX (readString [withValidate no]
                        "<xml><item>John</item><item2>Smith</item2></xml>"
                        >>> parseXml)
        putStrLn $ show person
        return ()
    
    main :: IO ()
    main = main1 parseXml0 >> main1 parseXml1