Search code examples
haskellhxt

Haskell's hxt fails if I add another line a function


I am trying to parse a COLLADA file with Haskell's hxt package.

I have been doing fine, but I have run into an odd bug (or more likely, an error on my part).

I have an arrow that looks like this:

processGeometry = proc x -> do
    geometry <- atTag "geometry" -< x
    meshID <- getAttrValue "id" -< geometry
    meshName <- getAttrValue "name" -< geometry
    mesh <- atTag "mesh" -< geometry
    sources <- hasName "source" <<< getChildren -< mesh
    positionSource <- hasAttrValue "id" ("-positions" `isSuffixOf`) -< sources
    positionArray  <- processFloatSource -< positionSource
    returnA -< positionArray

Adding the line

normalSource <- hasAttrValue "id" ("-normals" `isSuffixOf`) -< sources

near the bottom, however, makes the entire arrow fail.

This happens no matter what I return, even if I am returning the original x.

Here is my atTag function:

atTag tag = deep (isElem >>> hasName tag)

And here is my sample COLLADA file I am trying to parse: https://pastebin.com/mDSTH2TW

Why does adding a line change the outcome of the arrow completely, when it shouldn't do anything at all?


Solution

  • TL;DR: if you're looking for two separate child elements, use separate calls to getChildren.

    Your variable sources doesn't represent the list of all source elements. Instead it's a single source. If you check the type of sources, you'll see it's XMLTree. So when you use hasAttrValue on it twice, you're looking for a single source element that matches both cases.

    As for why it doesn't matter what you return: each line is executed even if its value isn't used. In fact unless you're using the output, you don't even have to assign it a name: a line of only hasAttrValue "id" (isSuffixOf "-normals") <- sources (removing normalSource <-) works just the same. So if you return x, it still returns x only when it can find that impossible source element.

    You can get your code to find both separate source elements by doing separate two calls to getChildren -- one per separate element you're looking for -- and checking the "id" attribute of each one separately.


    Here's a self-contained example if the above is unclear.

    data Tree a = Tree a [Tree a]
    
    exampleTree :: Tree String
    exampleTree = Tree "root" [Tree "childA" [], Tree "childB" []]
    
    newtype ListArrow a b = ListArrow { runListArrow :: a -> [b] }
    
    instance Category ListArrow where
        id = ListArrow (\x -> [x])
        (ListArrow g) . (ListArrow f) = ListArrow (\x -> concatMap g (f x))
    
    instance Arrow ListArrow where
        arr f = ListArrow (\x -> [f x])
        first (ListArrow f) = ListArrow (\(a, b) -> [ (a', b) | a' <- f a ])
    
    getChildren :: ListArrow (Tree a) (Tree a)
    getChildren = ListArrow gc where
        gc (Tree _ children) = children
    
    hasContent :: Eq a => a -> ListArrow (Tree a) (Tree a)
    hasContent content = ListArrow hc where
        hc cur@(Tree c _) = if content == c then [cur] else []
    
    getContent :: ListArrow (Tree a) a
    getContent = ListArrow gc where
        gc (Tree c _) = [c]
    
    -- this has the same problem as the code in the question
    findBothChildrenBad :: ListArrow (Tree String) (String, String)
    findBothChildrenBad = proc root -> do
        -- child is a (single) child of the root
        child <- getChildren -< root
    
        -- childA == child, and filter to only cases where its content is "childA"
        childA <- hasContent "childA" -< child
    
        -- childB == child, and filter to only cases where its content is "childB"
        childB <- hasContent "childB" -< child
        -- now the content has to be both "childA" and "childB" -- so we're stuck
    
        childAContent <- getContent -< childA
        childBContent <- getContent -< childB
        returnA -< (childAContent, childBContent)
    
    -- this is the fixed version
    findBothChildren :: ListArrow (Tree String) (String, String)
    findBothChildren = proc root -> do
        -- childA is a (single) child of the root
        childA <- getChildren -< root
    
        -- filter to only cases where its content is "childA"
        hasContent "childA" -< childA
    
        -- childB is a (potentially different) child of the root
        childB <- getChildren -< root
    
        -- filter to only cases where its content is "childB"
        hasContent "childB" -< childB
        -- we're not stuck here
    
        childAContent <- getContent -< childA
        childBContent <- getContent -< childB
        returnA -< (childAContent, childBContent)