Search code examples
javaandroidxmlsax

SAX - HTML attribute with no value


I am currently using SAX to parse some HTML. However, I now have to a parse a document that has something like this:

`<OPTION VALUE="123" SELECTED>`

and because SELECTED does not have an actual value set, it is throwing an error (not well-formed, invalid token). Is there a way to resolve this so I can keep using SAX?

My code:

        SAXParserFactory spf = SAXParserFactory.newInstance();
        SAXParser sp = spf.newSAXParser();
        XMLReader xr = sp.getXMLReader();

        xr.setContentHandler(sch);
        InputSource is = new InputSource(Statics.SUBJECT_CODE_URL);
        xr.parse(is);

Solution

  • You can't use SAX to parse HTML. HTML is not XML. A perfectly valid HTML document is NOT a valid XML document, and nothing you can do will make an XML parser parse it.