Making internal redirect on submitted links with HTMLPurifier (Munge.php)

I'm trying to setup internal redirect with the help of HTMLPurifier library. Most of documentation is pretty straight forward and the class for this purpose URI.Munge already exists, but there is no examples (all over internet) on how to make this work. Although, I'm not a highly skilled programmer, but documentation doesn't make sense to me on how to set it up, and even additional examples Here. Especially where to set %s, %r, %t etc.. and URI.MungeSecretKey.

Submission of text containing external links is done by TinyMCE. In PHP code I have allowed a href in configuration and added URI.Munge class:

$uri = $config->getURIDefinition(true);
$uri->addFilter(new HTMLPurifier_URIFilter_Munge(), $config);

I guess here should be some additional code...

Submitted code:

<a href="http://example.com">Link</a>

I'm getting output:

<a href="">Link</a>

Expect output:

<a href="/redirect?s=http%3A%2F%2Fexample.com&amp;t=c15354f3953dfec262c55b1403067e0d045a3059&amp;r=&amp;n=a&amp;m=href&amp;p=">Link</a>

Can someone give me a clue on how to achieve this? Or at least some sample code.

Solution

The configuration mentioned on the page http://htmlpurifier.org/live/configdoc/plain.html is generally used like this:

require_once '/path/to/HTMLPurifier.auto.php';

$config = HTMLPurifier_Config::createDefault();
$config->set('URI.Munge', ...);
$purifier = new HTMLPurifier($config);
$clean_html = $purifier->purify($dirty_html);

The URI Filters page is for when you want to write your own, custom filters - not for when you want to use the in-built ones! :)

What you probably want is this:

require_once '/path/to/HTMLPurifier.auto.php';

$config = HTMLPurifier_Config::createDefault();
$config->set('URI.Munge', '/redirect?s=%s');
$purifier = new HTMLPurifier($config);
$clean_html = $purifier->purify($dirty_html);

URI.MungeSecretKey helps by partially preventing the abuse of your redirection end-point as an arbitrary redirector (which is bad because e.g. then malevolent people will embed links like https://good.example.com/redirect?s=https://evil.example.com into emails, so it superficially appears like they're linking to https://good.example.com and their link gets a trust boost - yes, this happens, and yes, it works to some degree (and unfortunately, "some degree" is all it takes for schemes like this to be profitable to someone)).

If you supply e.g.:

require_once '/path/to/HTMLPurifier.auto.php';

$config = HTMLPurifier_Config::createDefault();
$config->set('URI.Munge', '/redirect?s=%s&t=%t');
$config->set('URI.MungeSecretKey', 'foobar');
$purifier = new HTMLPurifier($config);
$clean_html = $purifier->purify($dirty_html);

Then a URL https://friendly.example.com will be turned into (I'm leaving out urlencode() on the URL for legibility):

/redirect?s=https://friendly.example.com&t=41934069b21c4a3892b61194b90fc537970106ebc9fe79961930a0888b22245f

41934069b21c4a3892b61194b90fc537970106ebc9fe79961930a0888b22245f is the result of hash_hmac("sha256", "https://friendly.example.com", "foobar"). Your redirect end-point can check this value by computing hash_hmac("sha256", $_GET['s'], "foobar") and checking it against $_GET['t']. If those two match, the redirect was generated by HTML Purifier.

By itself, this isn't a perfect protection against redirector abuse, but it helps.

The shortfalls are described in the documentation of URI.MungeSecretKey:

Please note that it would still be possible for an attacker to procure secure hashes en-mass by abusing your website's Preview feature or the like, but this service affords an additional level of protection that should be combined with website blacklisting.

Does that help?