Search code examples
javahtmlencodejsoup

Get encoded html content only from url in java


Is there a library in JAVA where I can encode HTML, but only content?

I have like

<div>Tél</div>

and I only want

<div>T&eacute;l</div>

instead of

&lt;div&gt;T&eacute;l<&lt;/div&gt;

I need this library to encode an entire HTML. I have tried library JSoup but it has bugs when handling some objects.

Thanks


Solution

  • It's never a good idea to parse HTML using regex, that's a recipe for disaster.

    So first look at this Q&A for HTML parsing in java: Java HTML Parsing

    Once you are able to parse HTML and get internal HTML text then you can encode HTML in one of the these ways: Is there a JDK class to do HTML encoding (but not URL encoding)?