Search code examples
phphtmlpurifier

Link inside text in HTML purify


I have a link inside text:

$va="Some text http://www.stackoverflow.com?var=1&var2=2 more text"

When purify with this:

$config = HTMLPurifier_Config::createDefault();
$config->set('URI.MakeAbsolute', false);
$config->set('HTML.SafeObject', true);
$config->set('Output.FlashCompat', true);
$config->set('URI.AllowedSchemes',
        array (
                    'http' => true,
                    'https' => true,
                    'mailto' => true
                ));
$def = $config->getHTMLDefinition(true);
$def->addAttribute('a', 'target', 'Enum#_blank,_self,_target,_top');
$def->addAttribute('a', 'data-width', 'Text');
$def->addAttribute('a', 'data-height', 'Text');
$def->addAttribute('a', 'id', 'Text');
$def->addAttribute('a', 'name', 'Text');
$purifier = new HTMLPurifier($config);
$va = $purifier->purify($va);

Purify replace character & of the link for < how can i prevent this?


Solution

  • When I run your code, I get the desired result:

    <?php
    ini_set('display_errors', TRUE);
    error_reporting(E_ALL);
    
    include_once 'library/HTMLPurifier.auto.php';
    
    $raw = 'Some text http://www.stackoverflow.com?var=1&var2=2 more text';
    
    $config = HTMLPurifier_Config::createDefault();
    $config->set('URI.MakeAbsolute', false);
    $config->set('HTML.SafeObject', true);
    $config->set('Output.FlashCompat', true);
    $config->set('URI.AllowedSchemes',
            array (
                        'http' => true,
                        'https' => true,
                        'mailto' => true
                    ));
    $def = $config->getHTMLDefinition(true);
    $def->addAttribute('a', 'target', 'Enum#_blank,_self,_target,_top');
    $def->addAttribute('a', 'data-width', 'Text');
    $def->addAttribute('a', 'data-height', 'Text');
    $def->addAttribute('a', 'id', 'Text');
    $def->addAttribute('a', 'name', 'Text');
    $purifier = new HTMLPurifier($config);
    
    echo $purifier->purify($raw);
    

    I get

    Some text http://www.stackoverflow.com?var=1&amp;var2=2 more text
    

    Notice that the ampersand has been properly escaped. It must be a bug elsewhere in your code.