Search code examples
c#xmlxpathxmldocumentxmlnode

How to remove xml nodes that are not in an array of xpath strings?


I have an array of xpath values and an xml feed.

When the feed comes in, I want to filter each xml file by removing the nodes that are not in my array of xpath's.

I can think of a very dirty way to do this:

1) for each node in the xml, i form its xpath

2) check if it's in the array.

3) if not, remove.

Is there a cleaner way?


Solution

  • When the feed comes in, I want to filter each xml file by removing the nodes that are not in my array of xpath's

    Step1. Select all nodes that aren't selected by the given XPath expressions

    I guess that by "nodes" you mean elements. If so, this XPath expression:

    //*[count(. | yourExpr1 | yourExpr2 ... | yourExprN)
       >
        count(yourExpr1 | yourExpr2 ... | yourExprN)
       ]
    

    selects all elements in the XML document that aren't selected by any of your N XPath expressions yourExpr1, yourExpr2, ... , yourExprN

    If by "nodes" you mean elements, text-nodes, processing-instruction-nodes (PIs), comment-nodes and attribute nodes, use this XPath expression to select all nodes not selected by your N XPath expressions:

    (//node() | //*/@*)
       [count(. | yourExpr1 | yourExpr2 ... | yourExprN)
       >
        count(yourExpr1 | yourExpr2 ... | yourExprN)
       ]
    

    Step2. Delete all nodes selected in Step1.

    For each of the nodes selected in Step1 above, use:

     node.ParentNode.RemoveChild(node);
    

    Explanation:

    1. The XPath union operator | produces the union of two node-sets. Therefore the expression yourExpr1 | yourExpr2 ... | yourExprN when applied on the XML document produces the set of all nodes that are selected by any of the N given XPath expressions.

    2. A node $n doesn't belong to a set of nodes $ns exactly when ...

      count($n | $ns) > count($ns)