Search code examples
javaxmlandroidsaxstringreader

Using Android SAXParser, one my my XML Elements is mysteriously breaking in half


And its not '&'

Im using the SAXParser object do parse the actual XML.

This is normally done by passing a URL to the XMLReader.Parse method. Because my XML is coming from a POST request to a webservice, I am saving that result as a String and then employing StringReader / InputSource to feed this string back to the XMLReader.Parse method.

However, something strange is happening at the 2001st character of the XMLstring.
The 'characters' method of the document handler is being called TWICE in between the startElement and endElement methods, effectively breaking my string (in this case a project title) into two pieces. Because I am instantiating objects in my characters method, I am getting two objects instead of one.

This line, about 2000 chars into the string fires 'characters' two times, breaking between "Lower" and "Level"

<title>SUMC-BOOKSTORE, LOWER LEVEL RENOVATIONS</title>

When I bypass the StringReader / InputSource workaround and feed a flat XML file to XMLReader.Parse, it works absolutely fine.

Something about StringReader and or InputSource is somehow screwing this up.

Here is my method that takes and XML string and parses is through the SAXParser.

    public void parseXML(String XMLstring) {
    try {
        SAXParserFactory spf = SAXParserFactory.newInstance();
        SAXParser sp = spf.newSAXParser();
        XMLReader xr = sp.getXMLReader();
        xr.setContentHandler(this);

        // Something is happening in the StringReader or InputSource 
        // That cuts the XML element in half at the 2001 character mark.

        StringReader sr = new StringReader(XMLstring);
        InputSource is = new InputSource(sr);
        xr.parse(is);


    } catch (IOException e) {
        Log.e("CMS1", e.toString());
    } catch (SAXException e) {
        Log.e("CMS2", e.toString());
    } catch (ParserConfigurationException e) {
        Log.e("CMS3", e.toString());
    }
}

I would greatly appreciate any ideas on how to not have 'characters' firing off twice when I get to this point in the XML String.

Or, show me how to use a POST request and still pass off the URL to the Parse function.

THANK YOU.


Solution

  • As donroby said it's perfectly legitimate for the parser to call the characters method more than once between startElement and endElement. However that isn't "misbehaving" at all and you shouldn't try to finagle things so that it doesn't happen. Your parser seems to be using a 2000-character buffer, but there are other reasons it might break a text node into parts.

    What you should do is to accumulate data in the characters method and process it later, in the endElement method when you are sure you have accumulated all of the character data for the node.