Search code examples
javarsssaxwell-formed

java sax parse not well formed xml


I am implementing an RSS search feature from a search engine, using Java and SAX. However, some search results are not well-formed, i.e. the body of the <title> tag of some entries contains the & character instead of &amp; etc (e.g. something like, let's say, Starsky & Hutch).

When parsing the RSS, I get a org.apache.harmony.xml.ExpatParser&ParseException, causing the whole search to interrupt and return nothing.

I want my parser to work around these errors, like Firefox's RSS reader does. What are the posibilities to fix this issue and parse the RSS feed?


Solution

  • SAX implementations are typically designed to detect errors and throw exceptions, and there is no standard way to work around this. The most reasonable thing I can think of is to try to patch the errors, such as stray & characters, before streaming to SAX.