Search code examples
phpcodeigniterhtmlpurifierpurify

HTML Purifier: How to prevent from removing href attribute of anchor tags


I am stuck in the HTML Purifier configuration to not removed any href attribute of anchor tags.

Current output:

enter image description here

Expected output: (with href attr)

enter image description here

Below is my HTML Purifier function:

    function html_purify($content)
{
    if (hooks()->apply_filters('html_purify_content', true) === false) {
        return $content;
    }

    $CI = &get_instance();
    $CI->load->config('migration');

    $config = HTMLPurifier_HTML5Config::create(
        HTMLPurifier_HTML5Config::createDefault()
    );

    $config->set('HTML.DefinitionID', 'CustomHTML5');
    $config->set('HTML.DefinitionRev', $CI->config->item('migration_version'));

    // Disables cache
   // $config->set('Cache.DefinitionImpl', null);

    $config->set('HTML.SafeIframe', true);
    $config->set('Attr.AllowedFrameTargets', ['_blank']);
    $config->set('Core.EscapeNonASCIICharacters', true);
    $config->set('CSS.AllowTricky', true);

    // These config option disables the pixel checks and allows
    // specifiy e.q. widht="auto" or height="auto" for example on images
    $config->set('HTML.MaxImgLength', null);
    $config->set('CSS.MaxImgLength', null);

    //Customize - Allow image data
    $config->set('URI.AllowedSchemes', array('data' => true));

    //allow YouTube and Vimeo
    $regex = hooks()->apply_filters('html_purify_safe_iframe_regexp', '%^(https?:)?//(www\.youtube(?:-nocookie)?\.com/embed/|player\.vimeo\.com/video/)%');

    $config->set('URI.SafeIframeRegexp', $regex);
    hooks()->apply_filters('html_purifier_config', $config);

    $def = $config->maybeGetRawHTMLDefinition();

    if ($def) {
        $def->addAttribute('p', 'pagebreak', 'Text');
        $def->addAttribute('div', 'align', 'Enum#left,right,center');
        $def->addElement(
            'iframe',
            'Inline',
            'Flow',
            'Common',
            [
                'src'                   => 'URI#embedded',
                'width'                 => 'Length',
                'height'                => 'Length',
                'name'                  => 'ID',
                'scrolling'             => 'Enum#yes,no,auto',
                'frameborder'           => 'Enum#0,1',
                'allow'                 => 'Text',
                'allowfullscreen'       => 'Bool',
                'webkitallowfullscreen' => 'Bool',
                'mozallowfullscreen'    => 'Bool',
                'longdesc'              => 'URI',
                'marginheight'          => 'Pixels',
                'marginwidth'           => 'Pixels',
            ]
        );
    }

    $purifier = new HTMLPurifier($config);

    return $purifier->purify($content);
}

What is the correct configuration to be added in order to allow href attr in any anchor tags?


Solution

  • URI.AllowedSchemes is a whitelist, so the setting you're plugging into it allows only data URLs to the exclusion of others. Since this marks the URL https://google.com as a disallowed value for href, the href is empty, and the empty href is stripped.

    If you want to expand the default whitelist, here it is for reference:

    array (
      'http' => true,
      'https' => true,
      'mailto' => true,
      'ftp' => true,
      'nntp' => true,
      'news' => true,
      'tel' => true,
    )