Search code examples
xmlarraysxerces

ArrayIndexOutOfBoundsException in xerces parsing


I do not know where the problem is... Help and Thanks!

Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 8192

at com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.read(UTF8Reader.java:543) at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.load(XMLEntityScanner.java:1742) at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.arrangeCapacity(XMLEntityScanner.java:1619) at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.skipString(XMLEntityScanner.java:1657) at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanEndElement(XMLDocumentFragmentScannerImpl.java:1740) at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2930) at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:648) at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:510) at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:807) at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:737) at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:107) at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1205) at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:522) at javax.xml.parsers.SAXParser.parse(SAXParser.java:395) at javax.xml.parsers.SAXParser.parse(SAXParser.java:277) at myPackage.MainClass.main(MainClass.java:39)

In the mainclass, code framework as below:

SAXParserFactory sf = SAXParserFactory.newInstance();   
SAXParser sax = sf.newSAXParser();   
sax.parse("english.xml", new DefaultElementHandler("page"){   
public void processElement(Element element) { 
// process the element
}
}); 

The XML file is huge 4G, and full of text, I need to parse the file and process the text.

Currently, I did nothing the process part, just wanted to print them out in the console. Then OOB...


Solution

  • You might want to try printing out the error message that goes along with that stack trace. You can do that by adding a call to System.err.println(e.getMessage()) where e is the exception. The message should give you the index that was trying to be accessed.

    If the index is negative then there is most likely an integer overflow. If that's the case, you should file a bug report with Xerces. It's possible that Xerces wasn't designed to handle files that large.