I've read here:
https://stackoverflow.com/a/8932454/4301970
that htmlspecialchars() is very effective preventing xss attacks.
I'm receiving formated text from a wysiwyg editor, for example:
<p>
<em>
<strong><span style="font-size:36pt;">test</span></strong>
</em>
</p>
Encoding this on my html:
<!DOCTYPE html>
<html lang=en>
<head>
<title></title>
</head>
<body>
<?php echo htmlspecialchars('<p><em><strong><span style="font-size:36pt;">test</span></strong></em></p>', ENT_QUOTES); ?>
</body>
</html>
Will output on browser:
<p><em><strong><span style="font-size:36pt;">test</span></strong></em></p>
How can I display the formatted text correctly, while preventing XSS injections?
The htmlspecialchars
encodes all characters that have (or could) special meanings in XML, specifically <
, >
, &
, "
, and '
(if ENT_QUOTES is set).
So with this setting any malicious code attempts would not be rendered by the browser.
For example
<script>alert('bam');</script>
would be
<script>alert('bam');</script>
//or with quotes constant
<script>alert('bam');</script>
which JS won't process. So that can be an affective means of stopping XSS injections. However you want users to submit some HTML so you will need to make a kind of whitelist of approved elements. You can do that by replacing the <>
with custom text that won't occur in your users inputs. In my below example I've chosen custom_random_hack
. Then run everything through the htmlspecialchars
which will encode all special characters. Then convert your swapped elements back to their HTML elements.
$string = '<p>
<em>
<strong><span style="font-size:36pt;">test</span></strong>
</em>
</p>';
$allowedtags = array('p', 'em', 'strong');
echo '~<(/?(?:' . implode('|', $allowedtags) . '))>~';
$string = preg_replace('~<(/?(?:' . implode('|', $allowedtags) . '))>~', '#custom_random_hack$1custom_random_hack#', $string);
echo str_replace(array('#custom_random_hack', 'custom_random_hack#'), array('<', '>'), htmlspecialchars($string, ENT_QUOTES));
Demo: https://eval.in/582759