Search code examples
javajsoup

Jsoup Element.val() decodes the Encoded html content


I have a input tag inside a form whose value is set from one of the query parameters of the URL. For XSS protection, I am html encoding the query parameter value before setting it in the input tag.

Original Value sent in URL:

SomeValueWithSpeci@lCh@cters<""><''>

HTML Content generated by the Java code:

<form>
    <input type='hidden' value="SomeValueWithSpeci@lCh@cters&lt;&quot;&quot;&gt;&lt;''&gt;" />
</form>

Java code to parse the above html content.

Document doc = Jsoup.parse(htmlResponse);
Elements formElements = doc.getElementsByTag("form");
Elements inputTag = null;
for(Element form : formElements){
     inputTags = form.geElementsByTag("input");
}

for(Element input : inputTags){
     System.out.println(input.val());
}

Ouptut:

SomeValueWithSpeci@lCh@cters<""><''>

On submitting the form, the browser decodes the html content and sends the actual value to the receiver. I am writing a test to verify the encoding. It sends the request to the endpoint and receives this html response. If I print the response then the encoded string isn't decoded but when I use the Jsoup library, it is getting decoded. I believe, when I am parsing the HTML, the encoded value gets decoded or when I retrieve the input tag's value 'element.val()' at that time it gets decoded. Would like to know when it is actually getting decoded.

And, Is there any way to retrieve the encoded value as it is using the Jsoup Library ?


Solution

  • Apache Commons - StringEscapeUtils.unescapeHtml4

    String text = "&quot;bread&quot;";
    StringEscapeUtils.unescapeHtml4(text); // bread