Search code examples
phphtmlpurifier

HTMLPurifier Truncating Output


Why is the content here being truncated by HTMLPurifier? I can't tell, please let me know what you think?

It is wierd that it seemingly truncates before and after only displaying the PHP code, aint it?

It's like if there is PHP code it will only display that and drop all other content. Same behavior exists on HTMLPurifiers forums.

Narrowed down to the following problem:

IF PHP CODE:
  Truncate EVERYTHING but PHP CODE

View:

  <div class='groom_log_content'>
    <fieldset class='border-fields'>
      <legend class='bold'><?php echo $groom['content_name']; ?></legend>
        <p class='editable_textarea fix_space' id='<?php echo $groom['content_id']; ?>'><?php echo $this->cleaner->purify($content); ?></p>
    </fieldset>
  </div>

Config:

<?php
require 'htmlpurifier-4.6.0/library/HTMLPurifier.auto.php'; // HTML Purifier

class Clean {
    public function __construct() {
        $this->config = HTMLPurifier_Config::createDefault();
        $this->config->set('Attr.EnableID', true);
        $this->config->set('Attr.IDPrefix', 'gc_');
        $this->config->set('HTML.AllowedAttributes', '*.style,*.id,*.title,*.class,a.href,a.target,img.src,img.alt');
        $this->config->set('HTML.Allowed', 'a, a.href, abbr, acronym, b, blockquote, br, button, caption, cite, code, dd, del, dfn, div, dl, dt, em, fieldset, i, img, input, ins, kbd, l
egend, li, ol, p, pre, s, span, style, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, var');
        $this->def = $this->config->getHTMLDefinition(true);
        $this->def->addAttribute('a', 'target', 'Enum#_blank,_self,_target,_top');
        $this->fieldset = $this->def->addElement(
            'fieldset',
            'Block',
            'Flow',
            'Common',
            array(null)
        );
        $this->legend = $this->def->addElement(
            'legend',
            'Block',
            'Flow',
            'Common',
            array(null)
        );
        $this->input = $this->def->addElement(
            'input',
            'Block',
            'Flow',
            'Common',
            array(null)
        );
        $this->textarea = $this->def->addElement(
            'textarea',
            'Block',
            'Flow',
            'Common',
            array(null)
        );
        $this->button = $this->def->addElement(
            'button',
            'Block',
            'Flow',
            'Common',
            array(null)
        );
        $this->def->addElement('fieldset', 'Form', 'Custom: (#WS?,legend,(Flow|#PCDATA)*)', 'Common');
        $this->cleanse = new HTMLPurifier($this->config);
    }
}

Content from DB that is being truncated:

- The customer posts their MySQL connection string in chat with obvious errors and the agent tells them it looks correct.

5:17:52am Name: <html>
<head>
<title>Connecting to MySQL with PHP</title>
</head>
<body>
<?php
$db_host = 'localhost';
$db_user = 'user';
$db_pass = 'pass';
$conn = mysql_connect('host', 'user', 'pass', 'db');
if(! $conn )
{
die('Could not connect: ' . mysql_error());
}
echo 'Connected successfully';
mysql_close($conn);
?>
</body>
</html>
5:18:18am Name: is that the correct information to input so i can locate the database
5:19:13am Name: That looks to be correct.

Suggestion: Look at this line, it is the important one: $conn = mysql_connect('host', 'user', 'pass', 'db');

Though you cannot diagnose the code, you could immediately correct two issues. You must look for a valid cPanel database, database username, and address. Here, neither the database name nor the database username would be simply 'user', and if this is connecting to a database locally you would use localhost instead of the IP address.

Actual Output:

&lt;?php
$db_host = 'localhost';
$db_user = 'user';
$db_pass = 'pass';
$conn = mysql_connect('host', 'user', 'pass', 'db');
if(! $conn )
{
die('Could not connect: ' . mysql_error());
}
echo 'Connected successfully';
mysql_close($conn);
?&gt;

Solution

  • I think the php tags are a red herring: the real problem is HTML Purifier doesn't support title/html/body tags, so what it does is it extracts out the body portion of the HTML and just purifies that.