I am currently using SAX to parse some HTML. However, I now have to a parse a document that has something like this:
`<OPTION VALUE="123" SELECTED>`
and because SELECTED does not have an actual value set, it is throwing an error (not well-formed, invalid token). Is there a way to resolve this so I can keep using SAX?
My code:
SAXParserFactory spf = SAXParserFactory.newInstance();
SAXParser sp = spf.newSAXParser();
XMLReader xr = sp.getXMLReader();
xr.setContentHandler(sch);
InputSource is = new InputSource(Statics.SUBJECT_CODE_URL);
xr.parse(is);
You can't use SAX to parse HTML. HTML is not XML. A perfectly valid HTML document is NOT a valid XML document, and nothing you can do will make an XML parser parse it.