Search code examples
javaxmlxpathxml-parsingjaxp

Parsing XML tags nested within other XML values


I am stuck developing a specific XML parser which parses huge chunk of XML .

My problem is i'm confused how to parse XML tags nested within other XML values. My input file looks something like this.

<main>
<step>
    <para>Calculate the values from the pool</para>
</step>
<step>
        <para>Use these(<internalRef id ="003" xlink:actuate="onRequest" xlink:show="replace" xlink:href="max003"/>) values finally</para>
</step>
</main>

I am able to get the values of the first step tag using xpath. My problem is how to get the second step values using xpath or rather how to identify when a new tag is starting within a value tag.

For Eg, My second step xpath is returning me this result - Use these () values finally

where as my aim is to get- Use these (max003) values finally

The max003 values has to be taken from xlink:href

Addition - I am able to get individual values of id , actuate, show by writing separate xpaths. My question is i need to stuff the max003 value inside the parentheses after these and before values after getting the xlink:href value which is max003 and send it across the wire for display. So i am searching for a way to identify where and when the sub node id has started? and a mechanism to stuff it inside the parentheses .


Solution

  • The evaluation of this Xpath expression:

     concat(/*/step[2]/para/text()[1],
            /*/step[2]/para/internalRef/@xlink:href,
            /*/step[2]/para/text()[2])
    

    on the provided XML document (corrected to be namespace-wellformed):

    <main xmlns:xlink="Undefined namespace">
        <step>
            <para>Calculate the values from the pool</para>
        </step>
        <step>
            <para>Use these(<internalRef id ="003" xlink:actuate="onRequest" xlink:show="replace" xlink:href="max003"/>) values finally</para>
        </step>
    </main>
    

    produces the wanted result:

    Use these(max003) values finally
    

    Do note: You will need to "register the xlink namespace" with your XPath API, in order for this XPath expression to be evaluated without an error.

    XSLT-based verification:

    <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
     xmlns:xlink="Undefined namespace">
     <xsl:output method="text"/>
     <xsl:strip-space elements="*"/>
    
     <xsl:template match="/">
         <xsl:copy-of select=
         "concat(/*/step[2]/para/text()[1],
               /*/step[2]/para/internalRef/@xlink:href,
               /*/step[2]/para/text()[2])
         "/>
     </xsl:template>
    </xsl:stylesheet>
    

    When this transformation is applied on the provided XML document (above), the Xpath expression is evaluated and the result of this evaluation is copied to the output:

    Use these(max003) values finally