Search code examples
scalasprayspray-client

Parsing HTML with Spray


I get an exception The entity “nbsp” was referenced, but not declared when parsing valid HTML that contains the &nbsp entity (which makes it invalid XML; I do not control the server) while unmarshalling a HttpEntity into a NodeSeq with spray.httpx.unmarshalling.BasicUnmarshallers.NodeSeqUnmarshaller.

I can probably preprocess the HTML to remove &nbsp, but what is the accepted method for parsing HTML (with &nbsp) with Spray?


Solution

  • You might try to write a Custom Unmarshaller that wraps JSoup.