Search code examples
javaxmldomsaxstax

How do I add attribute in XML node faster for more than 10000 record?(java)


I have to add an attribute to XML node with more than 10k records so whats the best way to transform XML document faster.

I have tried StAX parser which almost takes 4 min for adding an attribute and using SAX parser it should take 5 min.

Is there any other lib available to do it better or another way to do that please give your suggestions.

Sample Code :(Using STAX Parser)

try {
        XMLStreamReader r = factory.createXMLStreamReader(new FileInputStream(inputfile));
        /* Start Writing document */
        XMLOutputFactory xmlOutputFactory = XMLOutputFactory.newInstance();
        XMLEventWriter xmlEventWriter = xmlOutputFactory.createXMLEventWriter(new FileOutputStream(outputfile),
                "UTF-8");
        /* End Writing document */
        int event = r.getEventType();
        long startTime = System.currentTimeMillis();
        System.out.println("Started reading node from xml document....." + TimeUnit.MILLISECONDS.toSeconds(startTime));
        int node1Cnt = 0, node2Cnt = 0, node3Cnt = 0, node4Cnt = 0;
        while (true) {
            XMLEventFactory eventFactory = XMLEventFactory.newInstance();
            switch (event) {
                case XMLStreamConstants.START_DOCUMENT:
                    // System.out.println("Start Document.");
                    StartDocument startDocument = eventFactory.createStartDocument();
                    xmlEventWriter.add(startDocument);
                    break;
                case XMLStreamConstants.START_ELEMENT:
                    // Create Start node
                    if (r.getLocalName().equalsIgnoreCase(node1)) {
                        node1Cnt++;
                        node2Cnt = 0;
                        Attribute attribute = eventFactory.createAttribute("id", "5522" + node1Cnt);
                        List attributeList = Arrays.asList(attribute);
                        List nsList = Arrays.asList();
                        StartElement sElement = eventFactory.createStartElement("", "", r.getLocalName(),attributeList.iterator(), nsList.iterator());
                        xmlEventWriter.add(sElement);
                    } else if (r.getLocalName().equalsIgnoreCase(node2Cnt)) {
                        node2Cnt++;
                        Attribute attribute = eventFactory.createAttribute("id", "5522" + node1Cnt + node2Cnt);
                        List attributeList = Arrays.asList(attribute);
                        List nsList = Arrays.asList();
                        StartElement sElement = eventFactory.createStartElement("", "", r.getLocalName(),
                                attributeList.iterator(), nsList.iterator());
                        xmlEventWriter.add(sElement);
                    } else {
                        StartElement sElement = eventFactory.createStartElement("", "", r.getLocalName());
                        xmlEventWriter.add(sElement);
                    }
                    StartElement sElement = eventFactory.createStartElement("", "", r.getLocalName());
                    xmlEventWriter.add(sElement);
                    break;
                case XMLStreamConstants.CHARACTERS:
                    if (r.isWhiteSpace())
                        break; // System.out.println("Text: " + r.getText());
                    Characters characters = eventFactory.createCharacters(r.getText());
                    xmlEventWriter.add(characters);
                    break;
                case XMLStreamConstants.END_ELEMENT:
                    // System.out.println("End Element:" + r.getName());
                    EndElement endElement = eventFactory.createEndElement("", "", r.getLocalName());
                    xmlEventWriter.add(endElement);
                    break;
                case XMLStreamConstants.END_DOCUMENT:
                    xmlEventWriter.add(eventFactory.createEndDocument());
                    break;
            }
            if (!r.hasNext())
                break;

            event = r.next();
        }
        r.close();
        System.out.println("Ended reading node from xml document....."
                + (TimeUnit.MILLISECONDS.toSeconds(System.currentTimeMillis())
                        - TimeUnit.MILLISECONDS.toSeconds(startTime)));
    }catch(XMLStreamException ex){
        ex.printStackTrace();
    }catch(IOException ex){
        // TODO Auto-generated catch block
        ex.printStackTrace();
    }finally{
        System.out.println("finish!!");
    }

Solution

  • I suspect that XMLEventFactory.newInstance() is very expensive, because it involves a search of the classpath. There is absolutely no need to create a new factory within the event loop: create one factory at the start and reuse it.

    Going beyond that, I suspect that using an XMLStreamWriter is probably both easier and faster than using an XMLEventWriter.

    (But these performance conjectures are guesses, as always when tuning performance you will need to make measurements to assess the affect of code changes.)

    Personally I would write this in XSLT. You haven't given quite enough detail of the transformation, but it's something like this in XSLT 3.0:

    <xsl:transform....>
    
    <xsl:mode on-no-match="shallow-copy"/>
    
    <xsl:template match="node1">
      <xsl:copy>
        <xsl:copy-of select="@*"/>
        <xsl:variable name="node1id" as="xs:string">
          <xsl:text>5522</xsl:text>
          <xsl:number/>
        </xsl:variable>
        <xsl:attribute name="id" select="$node1id"/>
        <xsl:apply-templates>
          <xsl:with-param name="node1id" select="$node1id" tunnel="yes"/>
        </xsl:apply-templates>
      </xsl:copy>
    </xsl:template>
    
    <xsl:template match="node2">
      <xsl:param name="node1id" tunnel="yes"/>
      <xsl:copy>
        <xsl:copy-of select="@*"/>
        <xsl:attribute name="id">
          <xsl:value-of select="$node1id"/>
          <xsl:number/>
        </xsl:attribute>
        <xsl:apply-templates/>
      </xsl:copy>
    </xsl:template>
    
    </xsl:transform>