Search code examples
xmlxpathvtd-xml

Removing comments from xml file with vtd-xml , delete comment


is there a way to remove the comments from a huge xml file (>200 MB), parsed by vtd-xml ?

Both, comments before the root element

<!-- comment -->
<rootElement>
.
.
.
 </rootElement>

and comments within

<rootElement>
<book>
<!-- comment -->
</book>
</rootElement>

The best solution would be with xPath. I tried

//comment()

which works with DOM but not with vtd-xml

Here is my code for selecting comments

String xPath = "//comment()"
XMLModifier xm = new XMLModifier();
VTDGen vg = new VTDGen();
if (vg.parseFile(fnIn,true)){
       VTDNav vn = vg.getNav();
       xm.bind(vn);
       nodeXpath(xPath,vn);
}

private void nodeXpath(String xPath, VTDNav vn) throws Exception{
    int result;

    AutoPilot ap = new AutoPilot();
    ap.selectXPath(xPath);
    ap.bind(vn);
    while((result = ap.evalXPath())!=-1){
        int p = vn.getText();

        if (p!=-1) {                
            System.out.println(vn.getText() + ", " + vn.toString(p));               
        }
    }
}

But the nothing is printed to screen here.

Is there a way to do that with vtd xml?

Thanks for your help.


Solution

  • You mentioned that your code prints nothing to the screen... not even commas? I wouldn't expect it to necessarily print anything from getText(), since the doc for getText() seems to indicate that it returns "the type character data or CDATA", which I don't think includes the content of a comment. (Thank you, @vtd-xml-author, for confirming that.)

    A good test would be to print something in every iteration of your while loop before p = vn.getText(), so you'll know whether it's finding the comments at all.

    If it is finding the comments, I think you'll want to call xm.removeToken(result) on each one.