Search code examples
javaxmlstaxxmlstreamreader

Update XML using XMLStreamWriter


I have a large XML and I want to update a particular node of the XML (like removing duplicate nodes).

As the XML is huge I considered using the STAX api class - XMLStreamReader. I first read the XML using XMLStreamReader. I stored the read data in user objects and manipulated these user objects to remove duplicates.

Now I want to put this updated user object back into my original XML. What I thought is that I can marshall the user object to a string and place the string at the right position in my input xml. But I am not able to achieve it using the STAX class - XMLStreamWriter

Can this be achieved using XMLStreamWriter? Please suggest. If no, they please suggest an alternative approach to my problem.

My main concern is memory as I cannot load such huge XMLs into our project server's memory which is shared across multiple processes. Hence I do not want use DOM because this will use lot of memory to load these huge XML.


Solution

  • If you need to alter a particular value like text content /tag name etc. STAX might help. It would also help in removing few elements using createFilteredReader

    Below code renames Name to AuthorName and adds a comment

    public class StAx {
        public static void main(String[] args) throws FileNotFoundException,
                XMLStreamException {
    
            String filename = "HelloWorld.xml";
    
            try (InputStream in = new FileInputStream(filename);
                    OutputStream out = System.out;) {
                XMLInputFactory factory = XMLInputFactory.newInstance();
                XMLOutputFactory xof = XMLOutputFactory.newInstance();
                XMLEventFactory ef = XMLEventFactory.newInstance();
    
                XMLEventReader reader = factory.createXMLEventReader(filename, in);
                XMLEventWriter writer = xof.createXMLEventWriter(out);
    
                while (reader.hasNext()) {
                    XMLEvent event = (XMLEvent) reader.next();
                    if (event.isCharacters()) {
                        String data = event.asCharacters().getData();
                        if (data.contains("Hello")) {
                            String replace = data.replace("Hello", "Oh");
                            event = ef.createCharacters(replace);
                        }
                        writer.add(event);
                    } else if (event.isStartElement()) {
                        StartElement s = event.asStartElement();
                        String tagName = s.getName().getLocalPart();
                        if (tagName.equals("Name")) {
                            String newName = "Author" + tagName;
                            event = ef.createStartElement(new QName(newName), null,
                                    null);
                            writer.add(event);
                            writer.add(ef.createCharacters("\n          "));
                            event = ef.createComment("auto generated comment");
                            writer.add(event);
                        } else {
                            writer.add(event);
                        }
                    } else {
                        writer.add(event);
                    }
                }
                writer.flush();
            } catch (Exception e) {
                e.printStackTrace();
            }
        }
    }
    

    Input

    <?xml version="1.0"?>
    <BookCatalogue>
        <Book>
            <Title>HelloLord</Title>
            <Name>
                <first>New</first>
                <last>Earth</last>
            </Name>
            <ISBN>12345</ISBN>
        </Book>
        <Book>
            <Title>HelloWord</Title>
            <Name>
                <first>New</first>
                <last>Moon</last>
            </Name>
            <ISBN>12346</ISBN>
        </Book>
    </BookCatalogue>
    

    Output

    <?xml version="1.0"?><BookCatalogue>
        <Book>
            <Title>OhLord</Title>
            <AuthorName>
                <!--auto generated comment-->
                <first>New</first>
                <last>Earth</last>
            </AuthorName>
            <ISBN>12345</ISBN>
        </Book>
        <Book>
            <Title>OhWord</Title>
            <AuthorName>
                <!--auto generated comment-->
                <first>New</first>
                <last>Moon</last>
            </AuthorName>
            <ISBN>12346</ISBN>
        </Book>
    </BookCatalogue>
    

    As you can see things gets really complicated when modification is much more than this like swapping two nodes deleting one node based on state of few other node : delete All Books with price more than average price

    Best solution in this case is to produce resulting xml using xslt transformation