Search code examples
javaxmlxpathnamespacesjdom-2

JDOM XPath Getting Inner Element without Namespace


I have an xml like this:

<root
    xmlns:gl-bus="http://www.xbrl.org/int/gl/bus/2006-10-25"
    xmlns:gl-cor="http://www.xbrl.org/int/gl/cor/2006-10-25" >
    <gl-cor:entityInformation>
        <gl-bus:accountantInformation>
            ...............
        </gl-bus:accountantInformation>
    </gl-cor:entityInformation>
</root>

All I want to extract the element "gl-cor:entityInformation" from the root with its child elements. However, I do not want the namespace declarations come with it.

The code is like this:

XPathExpression<Element> xpath = XPathFactory.instance().compile("gl-cor:entityInformation", Filters.element(), null, NAMESPACES);
Element innerElement = xpath.evaluateFirst(xmlDoc.getRootElement());

The problem is that the inner element holds the namespace declarations now. Sample output:

<gl-cor:entityInformation xmlns:gl-cor="http://www.xbrl.org/int/gl/cor/2006-10-25">
    <gl-bus:accountantInformation xmlns:gl-bus="http://www.xbrl.org/int/gl/bus/2006-10-25">
    </gl-bus:accountantInformation>
</gl-cor:entityInformation>

This is how I get xml as string:

public static String toString(Element element) {
    Format format = Format.getPrettyFormat();
    format.setTextMode(Format.TextMode.NORMALIZE);
    format.setEncoding("UTF-8");

    XMLOutputter xmlOut = new XMLOutputter(); 
    xmlOut.setFormat(format);
    return xmlOut.outputString(element);
}

As you see the namespace declarations are passed into the inner elements. Is there a way to get rid of these declarations without losing the prefixes?

I want this because later on I will be merging these inner elements inside another parent element and this parent element has already those namespace declarations.


Solution

  • JDOM by design insists that the in-memory model of the XML is well structured at all times. The behaviour you are seeing is exactly what I would expect from JDOM and I consider it to be "right". JDOM's XMLOutputter also outputs well structured and internally consistent XML and XML fragments.

    Changing the bahaviour of the internal in-memory model is not an option with JDOM, but customizing the XMLOutputter to change its behaviour is relatively easy. The XMLOutputter is structured to have an "engine" supplied as a constructor argument: XMLOutputter(XMLOutputProcessor). In addition, JDOM supplies an easy-to-customize default XMLOutputProcessor called AbstractXMLOutputProcessor.

    You can get the behaviour you want by doing the following:

    private static final XMLOutputProcessor noNamespaces = new AbstractXMLOutputProcessor() {
    
        @Override
        protected void printNamespace(final Writer out, final FormatStack fstack, 
            final Namespace ns)  throws IOException {
            // do nothing with printing Namespaces....
        }
    
    };
    

    Now, when you create your XMLOutputter to print your XML element fragment, you can do the following:

    public static String toString(Element element) {
        Format format = Format.getPrettyFormat();
        format.setTextMode(Format.TextMode.NORMALIZE);
        format.setEncoding("UTF-8");
    
        XMLOutputter xmlOut = new XMLOutputter(noNamespaces); 
        xmlOut.setFormat(format);
        return xmlOut.outputString(element);
    }
    

    Here's a full program working with your input XML:

    import java.io.IOException;
    import java.io.Writer;
    
    import org.jdom2.Document;
    import org.jdom2.Element;
    import org.jdom2.JDOMException;
    import org.jdom2.Namespace;
    import org.jdom2.filter.Filters;
    import org.jdom2.input.SAXBuilder;
    import org.jdom2.output.Format;
    import org.jdom2.output.XMLOutputter;
    import org.jdom2.output.support.AbstractXMLOutputProcessor;
    import org.jdom2.output.support.FormatStack;
    import org.jdom2.output.support.XMLOutputProcessor;
    import org.jdom2.xpath.XPathExpression;
    import org.jdom2.xpath.XPathFactory;
    
    
    public class JDOMEray {
    
        public static void main(String[] args) throws JDOMException, IOException {
            Document eray = new SAXBuilder().build("eray.xml");
            Namespace[] NAMESPACES = {Namespace.getNamespace("gl-cor", "http://www.xbrl.org/int/gl/cor/2006-10-25")};
            XPathExpression<Element> xpath = XPathFactory.instance().compile("gl-cor:entityInformation", Filters.element(), null, NAMESPACES);
            Element innerElement = xpath.evaluateFirst(eray.getRootElement());
    
            System.out.println(toString(innerElement));
        }
    
        private static final XMLOutputProcessor noNamespaces = new AbstractXMLOutputProcessor() {
    
            @Override
            protected void printNamespace(final Writer out, final FormatStack fstack, 
                final Namespace ns)  throws IOException {
                // do nothing with printing Namespaces....
            }
    
        };
    
        public static String toString(Element element) {
            Format format = Format.getPrettyFormat();
            format.setTextMode(Format.TextMode.NORMALIZE);
            format.setEncoding("UTF-8");
    
            XMLOutputter xmlOut = new XMLOutputter(noNamespaces); 
            xmlOut.setFormat(format);
            return xmlOut.outputString(element);
        }
    
    
    }
    

    For me the above program outputs:

    <gl-cor:entityInformation>
      <gl-bus:accountantInformation>...............</gl-bus:accountantInformation>
    </gl-cor:entityInformation>