I'd like to use HTML Purifier to transform <body>
tags to <div>
tags, to preserve inline styling on the <body>
element, e.g. <body style="background:color#000000;">Hi there.</body>
would turn to <div style="background:color#000000;">Hi there.</div>
. I'm looking at a combination of a custom tag and a TagTransform
class.
In my configuration section, I'm currently doing this:
$htmlDef = $this->configuration->getHTMLDefinition(true);
// defining the element to avoid triggering 'Element 'body' is not supported'
$bodyElem = $htmlDef->addElement('body', 'Block', 'Flow', 'Core');
$bodyElem->excludes = array('body' => true);
// add the transformation rule
$htmlDef->info_tag_transform['body'] = new HTMLPurifier_TagTransform_Simple('div');
...as well as allowing <body>
and its style
(and class
, and id
) attribute via the configuration directives (they're part of a working, large list that's parsed into HTML.AllowedElements
and HTML.AllowedAttributes
).
I've turned definition caching off.
$config->set('Cache.DefinitionImpl', null);
Unfortunately, in this setup, it seems like HTMLPurifier_TagTransform_Simple
never has its transform()
method called.
I presume the culprit is my HTML.Parent
, which is set to 'div'
since, quite naturally, <div>
does not allow a child <body>
element. However, setting HTML.Parent
to 'html'
nets me:
ErrorException: Cannot use unrecognized element as parent
Adding...
$htmlElem = $htmlDef->addElement('html', 'Block', 'Flow', 'Core');
$htmlElem->excludes = array('html' => true);
...gets rid of that error message but still doesn't transform the tag - it's removed instead.
Adding...
$htmlElem = $htmlDef->addElement('html', 'Block', 'Custom: head?, body', 'Core');
$htmlElem->excludes = array('html' => true);
...also does nothing, because it nets me an error message:
ErrorException: Trying to get property of non-object
[...]/library/HTMLPurifier/Strategy/FixNesting.php:237
[...]/library/HTMLPurifier/Strategy/Composite.php:18
[...]/library/HTMLPurifier.php:181
[...]
I'm still tweaking around with the last option now, trying to figure out the exact syntax I need to provide, but if someone knows how to help me based on their own past experience, I'd appreciate any pointers in the right direction.
As the only other culprit I can imagine it being, my HTML.TidyLevel
is set to 'heavy'
. I've yet to try all possible constellations on this, but so far, this is making no difference.
(Since I've only been touching this secondarily, I struggle to recall which constellations I've already tried, lest I would list them here, but as it is I lack confidence I wouldn't miss something I've done or misreport something. I might edit this section later when I've done some dedicated testing, though!)
My configuration data is stored in JSON and then parsed into HTML Purifier. Here's the file:
{
"CSS" : {
"MaxImgLength" : "800px"
},
"Core" : {
"CollectErrors" : true,
"HiddenElements" : {
"script" : true,
"style" : true,
"iframe" : true,
"noframes" : true
},
"RemoveInvalidImg" : false
},
"Filter" : {
"ExtractStyleBlocks" : true
},
"HTML" : {
"MaxImgLength" : 800,
"TidyLevel" : "heavy",
"Doctype" : "XHTML 1.0 Transitional",
"Parent" : "html"
},
"Output" : {
"TidyFormat" : true
},
"Test" : {
"ForceNoIconv" : true
},
"URI" : {
"AllowedSchemes" : {
"http" : true,
"https" : true,
"mailto" : true,
"ftp" : true
},
"DisableExternalResources" : true
}
}
(URI.Base
, URI.Munge
and Cache.SerializerPath
are also set, but I've removed them in this paste. Also, HTML.Parent
caveat: As mentioned, usually, this is set to 'div'
.)
This code is the reason why what you're doing doesn't work:
/** * Takes a string of HTML (fragment or document) and returns the content * @todo Consider making protected */ public function extractBody($html) { $matches = array(); $result = preg_match('!<body[^>]*>(.*)</body>!is', $html, $matches); if ($result) { return $matches[1]; } else { return $html; } }
You can turn it off using %Core.ConvertDocumentToFragment as false; if the rest of your code is bugfree, it should work straight from there. I don't believe your bodyElem definition is necessary.j