We are developing an application which takes the user input as Html and render the same Html as output in a different page. And the input should never have any dynamic behaviour in it like script tags.
We Html Encode the value in Javascript and save the encoded value in DB. We Html Decode the saved value and render it in the new page to get the expected result(check below example).
From what I have read so far, I should Html Encode the input before rendering it as output in a different page. The problem I am facing in this is that whatever the Html added by user is displayed the same in the new page
Example:
User Input:
<div><h2>Header</h2><p>this is the body text</p></div>
Output in the new page when Html encoded and assigned it to another div:
<div><h2>Header</h2><p>this is the body text</p></div>
Expected:
Header
this is the body text
The only way I was able to achieve the expected result was when I Html decoded the saved value and assigned it to another container control.
Am I missing something, I tried all the ways I am aware of Html Encoding the user input and rendering it back is not giving me the expected result. Any idea on how to achieve this?
If there is no other solution, is there any validation framework in .net available to avoid XSS attacks. I have went through AntiXSS framework from microsoft they are more for stripping any harmfull html and encoding. They do not help in letting the user know that they should not be entering some tags.
Thanks for any help in advance.
If the user input is HTML, and you encode it before saving it, then when you display it, you should decode it.
The reason the recommendation exists to encode before displaying is if the user input is expected to be text, it is recommended to encode for general display purposes (so that an ampersand actually displays as &
) and also to prevent potentially malicious input from being rendered on the page and interpreted by the browser (e.g. <script>
tags).
Please be careful: If you are intending to display HTML that is provided by a user that you try to sanitize the input as much as possible -- make sure they aren't trying to do anything malicious and also to make sure they don't make a simple mistake that could wreck the entire layout of a webpage (e.g. have an opening tag without a closing tag). This type of sanitation is no simple task and one of the major factors why other flavors of markup exist in the first place (e.g. Mark Down, BBCode, etc.).