Search code examples
javaxml

XML filtering - quick find specific nodes by name + remove parent if value not match


I need a hint regarding a quick finding of specific nodes within the XML and removing the entire parent node (with children) if some of the values don't match the input parameters.

Example, having the XML as shown below:

<someparent attr="123" filters="+F1">
    <filter id="F1">
        <width>
            <paper size="a4" val="10" />
            <paper size="a3" val="12" />
        </width>
        <height>
            <paper size="a4" val="10" />
            <paper size="a3" val="12" />
        </height>
    </filter>
</someparent>

I should apply some rules:

  • like if filters has a value starting with + (+F1) then if parameters match sizes and values, like: a4/10 or a3/12 should not remove the someparent node - any other size should causing the node removal
  • if filters has a value starting with - (-F1) then if parameters matching sizes and values, like: a4/10 or a3/12 should remove the someparent node - any other size should leave the node intact

However, I think that may be irrelevant at this point. The most important is quickly finding the filter nodes and removing parent nodes if needed.

Extra notes:

  • XPath is way too slow - literally unacceptable, Iterating over every single node is relatively quick - it's currently working like that - however, I'd like to improve that. I'm pretty sure it can be improved.
  • it may happen that filter node(s) does not exist in the file at all

My plan is to create some prototypes, however... I'd appreciate any hints that may help me.

EDIT:// Sorry for late reply on that. So I've ended with StAX - it's rapid and works perfectly for me. Thank you all involved.


Solution

  • In general the different built-in parsers are SAX, StAX and DOM (https://rdayala.wordpress.com/dom-vs-sax-parsers/).

    • DOM is the slow one (load everything into memory) and is used with XPath.
    • SAX is a pain to use.
    • StAX actually has 2 APIs:
      • the iterator API, e.g. XMLEventReader (easier)
      • the cursor API, e.g. XMLStreamReader (more efficient)

    You could also try using XSLT, but the built-in one isn't necessarily the most high performing and you may need to pay for a premium one or to use all its features (streamed processing):
    https://docs.oracle.com/javase/tutorial/jaxp/xslt/transformingXML.html