Search code examples
xmlhtmlunit

How do I define a new entity for the HtmlUnit XML parser?


I'm running into an issue with the HtmlUnit parser where I'm trying to grab some XML from a website (using the website's API) do a quick parse of the resulting XML and then save the XML to a file (all within the rights of the API). (sample content)

Unfortunately the website returns an entity ¿ in some of the requested pages, and while this is a valid HTML entity HtmlUnit is throwing an exception during the parse with message:

The entity "iquest" was referenced, but not declared.

How do I define iquest as a valid entity?


Solution

  • You can't define ¿ except by editing the data you received (the data is not XML as any validator will show e.g. first one I found on google

    The site is not serving valid XML so the best wayis ask it to fix the XML.

    When that fails then either so a search and replace on ¿ or add a DOCTYPE that defines the entity &iquest.