Search code examples
haskellxpathmonoidshxtarrow-abstraction

Logical OR in HXT without duplicating results


I'm having a little trouble with HXT: I am trying to locate all the nodes in a document that match some criteria, and I'm trying to combine the lenses/XPaths as predicates in an OR-like fashion, using Control.Arrow.<+>, as this guide suggests. However, when I try to "run" the arrow on my document, I am getting duplicate results. Is there an easy way to remove the duplicates, or to combine the tests in a more meaningful way? Here is my code:

run :: App -> IO ()
run a = do
  inputContents <- readFile (input a)
  let doc = readString [withParseHTML yes, withWarnings no] inputContents
  links <- runX . xshow $ doc >>> indentDoc //> cssLinks
  mapM_ putStrLn links

cssLinks = links >>> (rels <+> hrefs <+> types)
  where
    links = hasName "link"
    rels = hasAttrValue "rel" (isInfixOf "stylesheet")
    hrefs = hasAttrValue "href" (endswith ".css")
    types = hasAttrValue "type" (== "text/css")

Yet every time I run this (on any web page), I get duplicated results / nodes. I noticed that <+> is part of the ArrowPlus typeclass, which mimics a monoid, and ArrowXML is an instance of both ArrowList and ArrowTree, which gives me a lot to work with. Would I have to construct ArrowIf predicates? Any help with this would be wonderful :)


Solution

  • You may get the arrow result as a [XmlTree], then apply List.nub, then get the string rep.

    import "hxt" Text.XML.HXT.DOM.ShowXml as SX
    ...
    
      links <- runX $ doc >>> indentDoc //> cssLinks
    
      -- first remove duplicates (List.nub) then apply SX.xshow
      putStrLn (SX.xshow . L.nub $ links)