Search code examples
javaandroidxmlsaxparserrss-reader

RSS Reader using Sax Parser losing characters from title


I'm trying to use a SAX parser in order to return the contents of an RSS feed from a URL - http://pitchfork.com/rss/news/, but often characters are lost in displaying the title, showing partial text or just a closing tag ">"

How can i modify my handler class to prevent this? I think I should probably use StringBuilder or StringBuffer, but i'm not sure how to implement it.

ParseHandler.java

public class RssParseHandler extends DefaultHandler {
//Parsed items
private List<RssItem> rssItems;
private RssItem currentItem;
private boolean parsingTitle;
private boolean parsingLink;
private boolean parsing_id;
private boolean parsingDescription;

public RssParseHandler() {
    rssItems = new ArrayList<RssItem>();
}

public List<RssItem> getItems() {
    return rssItems;
}

//Creates empty RssItem object during the process of an item start tag
//Indicators are set to true when particular tag is being processed
@Override
public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {

    if ("item".equals(qName)) {
        currentItem = new RssItem();

    } else if ("title".equals(qName)) {
        parsingTitle = true;


    } else if ("link".equals(qName)) {
        parsingLink = true;


    } else if ("_id".equals(qName)) {
        parsing_id = true;


    } else if ("description".equals(qName)) {
        parsingDescription = true;

    }
}

//Current RssItem is added to the list following process of end tag
@Override
public void endElement(String uri, String localName, String qName) throws SAXException {

    if ("item".equals(qName)) {
        rssItems.add(currentItem);
        currentItem = null;

    } else if ("title".equals(qName)) {
        parsingTitle = false;

    } else if ("link".equals(qName)) {
        parsingLink = false;

    } else if ("_id".equals(qName)) {
        parsing_id = false;

    } else if ("description".equals(qName)) {
        parsingDescription = false;
    }
}

@Override
public void characters(char[] ch, int start, int length) throws SAXException {

    if (parsingTitle) {
        if (currentItem != null)
            currentItem.setTitle(new String(ch, start, length));

    } else if (parsingLink) {
        if (currentItem != null) {
            currentItem.setLink(new String(ch, start, length));
            parsingLink = false;
        }

    } else if (parsing_id) {
        if (currentItem != null) {
            currentItem.set_id(new String(ch, start, length));
            parsing_id = false;
        }

    } else if (parsingDescription) {
        if (currentItem != null) {
            currentItem.setDescription(new String(ch, start, length));
            parsingDescription = false;
        }

    }
}}//rssHandlerClass

Solution

  • Use a StringBuilder to build the tag, rather than using a new String instance as the documentation says:

    The Parser will call this method to report each chunk of character data. SAX parsers may return all contiguous character data in a single chunk, or they may split it into several chunks; however, all of the characters in any single event must come from the same external entity so that the Locator provides useful information.

    And @CommonWares says this exactly in his post Here.

    Build your tag as it is found using StringBuilder, since there is chunks coming in at once rather than the entire string (This explains the incomplete tags!). You may or may not need the isBuilding flag, but I don't know your entire implementation so I added it incase.

       StringBuilder mSb;
       boolean isBuilding;
    
       @Override
       public void startElement(String uri, String localName, String qName,
             Attributes attributes) throws SAXException {
    
            mSb = new StringBuilder();
            isBuilding = true;
    
            if(qName.equals("title")){
                parsingTitle = true;
            }
            ...
            ...
        }
    
        @Override
        public void characters (char ch[], int start, int length) {
            if (mSb !=null && isBuilding) {
                for (int i=start; i<start+length; i++) {
                    mSb.append(ch[i]);
                }
            }
        }
    
        @Override
        public void endElement(String uri, String localName, String qName)
            throws SAXException {
    
            if(parsingTitle){
                currentItem.setTitle(sb.toString().trim());
                parsingTitle = false;  
                isBuilding = false;
            }
        }