Search code examples
htmlpurifier

htmlpurifier nofollow only working for generated links


I'm using the HTML purifier library for my website's comment formatting.

However, I noticed something. In my settings, I have HTML.nofollow set to true.

'HTML.Nofollow' => true,

But when I write an anchor link, nofollow won't be added. But with a generated link, it'll work just fine.

Example Input:

<a href="www.google.com">google</a>

https://www.yahoo.com

Output:

<p><a href="www.google.com">google</a></p>

<p><a href="https://www.yahoo.com" rel="nofollow">https://www.yahoo.com</a></p>

Here are my settings:

return [
    'encoding'      => 'UTF-8',
    'finalize'      => true,
    'cachePath'     => storage_path('app/purifier'),
    'cacheFileMode' => 0755,
    'settings'      => [
        'default' => [
            'HTML.Doctype'             => 'HTML 4.01 Transitional',
            'HTML.Allowed'             => 'b,i,u,ul,ol,li,p,blockquote,table,tr,th,td,a[href|title],sup,sub,span,code',
            'AutoFormat.AutoParagraph' => true,
            'AutoFormat.RemoveEmpty'   => true,
            'AutoFormat.Linkify' => true,
            'HTML.Nofollow' => true,
        ],

    ],

];

I would be fine removing as one of the allowed HTML elements, but for some reason this means generated URLs don't work at all anymore.


Solution

  • <a href="www.google.com">google</a>, as it lacks a scheme (http://, https://, ftp://, etc.), is actually a link to something like http://example.com/www.google.com (depending if you're in a folder of your site or not). As nofollow is typically used to prevent other sites from stealing link juice, relative links like this are probably exempt.

    Try it with something like this instead:

    <a href="https://www.google.com">google</a>
    

    (As a bonus, the link will actually work this way.)