Search code examples
xsshtmlspecialchars

dealing with htmlescape/htmlspecialchars


To prevent XSS, whenever you output back the user input ( like you do in displaying what was entered wrong or when re-painting the form with the earlier submitted values ), you do need to escape the html. That's a sure thing...

so, doing something like

echo "the name which was supplied as {$_GET['company_name']} is not accepted" 

would not be right.

Instead, we would do this.

echo "the name which was supplied as " . htmlspecialchars($_GET['company_name']) . " is not accepted" 

With that in mind, here comes my question;, what do you do when the $_GET['company_name'] needs to be displayed back in the textbox where it started from? maybe you want your user to correct that company_name just because it's too long?

if you were to use htmlspecialchars, and if the company_name was say AT&T, the & there would have escaped and appear as & amp; Isn't it?

So how do we deal with this situation? Of course, one might say, then don't htmlspecialchar it, just return it as is?

but then somebody may send us a company_name which is carefully crafted to stop the textbox start a javascript onclick and do the XSS from there.

How do you deal with the htmlescape in these situations? Just use the history.go(-1)?


Solution

  • I strongly encourage you to check out the OWASP XSS prevention cheat sheet if you're interested in learning more about preventing XSS.

    When a browser renders HTML (and associated content, like CSS), it identifies different rendering contexts for different types of input. Each context has distinct semantics for how and when it can execute script code. So your browser's rules for handling HTML are different than the rules it uses to render JavaScript, which are different for the rules for CSS, and so on. This means that if you're trying to prevent XSS, you have to be very sensitive to the context the untrusted data is being put in.

    If you are using server-side code like PHP to echo unsafe values into HTML attributes (including the value of a form input), you need to escape the text for HTML attributes. Assuming the page is using UTF-8 encoding, you would would do something like:

    <input type="text" value="<?php echo htmlspecialchars($_GET['company_name'], ENT_QUOTES, 'UTF-8'); ?>" >
    

    The "ENT_QUOTES" option is important, because it tells PHP to HTML escape quotation marks. Unescaped quotation marks can be used to "break out" of an attribute and add JavaScript event handlers like "onclick", 'onfocus" etc.

    In your "AT&T" example, you would not see &amp; in the input box. This is because in the context of an HTML attribute, your browser renders HTML entities (like &amp;) as their associated characters (like &).

    When might you see &amp; in the text box?

    If you modify the value of the input using JavaScript, your browser uses a different set of rules for determining how the new value will be handled. If you were to HTML escape 'AT&T' and then insert that new value using something like, ex. yourInput.setAttribute(“value”, HtmlEscapingFunction('AT&T')), the user would see AT&amp;T. This is because you're now working in a DOM execution context, and in a DOM execution context, HTML escaping an attribute value causes double-encoding.