I am implementing an RSS search feature from a search engine, using Java and SAX. However, some search results are not well-formed, i.e. the body of the <title>
tag of some entries contains the &
character instead of &
etc (e.g. something like, let's say, Starsky & Hutch
).
When parsing the RSS, I get a org.apache.harmony.xml.ExpatParser&ParseException
, causing the whole search to interrupt and return nothing.
I want my parser to work around these errors, like Firefox's RSS reader does. What are the posibilities to fix this issue and parse the RSS feed?
SAX implementations are typically designed to detect errors and throw exceptions, and there is no standard way to work around this. The most reasonable thing I can think of is to try to patch the errors, such as stray & characters, before streaming to SAX.