Search code examples
javaxmlstax

Take element text from xml which is not properly inside node


I'm having a xml file and I have tried to take the text inside <_3-auto> node using stax xml parser. The text is not properly inside any node, so stax couldn't take the value.Is there any other way to take the value using stax.

<_3-auto>
    <prefix>
        <autonum>(3)</autonum> 
    </prefix>
    Remove the rear fuselage support from FS755.00 of the aircraft.
</_3-auto>
<_3-auto>
    <prefix>
        <autonum>(4)</autonum> 
    </prefix>
    Put the hydraulic scissor lift (1) under the nose ballast assembly&#8201;(2).
</_3-auto>

This is the code that I write to take the text inside _3-auto node.

  try {
        XMLInputFactory inputFactory;
        inputFactory = XMLInputFactory.newInstance();
        InputStream inputStream = new FileInputStream(filePath);

        XMLStreamReader streamReader = inputFactory.createXMLStreamReader(inputStream);

        while (streamReader.hasNext()) {
            int event = streamReader.next();

            if (event == XMLStreamConstants.START_ELEMENT) {
                    if (streamReader.getLocalName().equals("_3-auto")) {
                        String auto = streamReader.getElementText();
                        System.out.println(auto);
                    }
            }

        }
    } catch (Exception e) {
        e.printStackTrace();
    }

Solution

  • You should not use getElementText() as the documentation says it is for text-only element.

    What you need to do here is to monitor also the XMLStreamConstants.CHARACTERS event when it occurs from a <_3-auto> node. A simple way to do this is to handle a context in your parsing to know when you are in such a node. In this case I made the simple assumption that you are in this node after the <_3-auto> StartElement event or after the </prefix> EndElement event :

            boolean current3AutoNode = false;
    
            while (streamReader.hasNext()) {
                int event = streamReader.next();
    
                if (event == XMLStreamConstants.START_ELEMENT) {
                        if (streamReader.getLocalName().equals("_3-auto")) {
                            current3AutoNode = true;
                        }
                        else {
                            current3AutoNode = false;
                        }
                }
                else if (event == XMLStreamConstants.END_ELEMENT) {
                    if (streamReader.getLocalName().equals("prefix")) {
                        current3AutoNode = true;    // after end of </prefix> we are back in <_3-auto> node
                    }
                    else {
                        current3AutoNode = false;
                    }
                }
                if (event == XMLStreamConstants.CHARACTERS && current3AutoNode) {
                    // these are the characters inside <_3-auto> </_3-auto>
                    String characters = streamReader.getText();
                    System.out.println(characters);
                }
            }
    

    This will be printing the "Remove the rear fuselage support from FS755.00 of the aircraft." and "Put the hydraulic scissor lift (1) under the nose ballast assembly (2)." text, with some more white-space characters that you can filter out.