Search code examples
javascriptphptinymce-4htmlpurifier

Cleaning up HTML using TinyMCE 4 or HTMLPurifier


I have a form with a description field that uses TinyMCE 4 for editing text and images.

Below are my configurations for TinyMCE:

tinymce.init({
    selector: '.tinymce',
    formats: {
        bold: [
            {inline: 'span', styles: {fontWeight: 'bold'}}
        ],
        italic: [
            {inline: 'span', styles: {fontStyle: 'italic'}}
        ],
        underline: [
            {inline: 'span', styles: {textDecoration: 'underline'}, exact: true}
        ],
        strikethrough: [
            {inline: 'span', styles: {textDecoration: 'line-through'}, exact: true}
        ]
    },
    width: '80%',
    height: 200,
    menubar: false,
    statusbar: false,
    plugins: [
        'advlist autolink save link image lists hr',
        'wordcount visualblocks visualchars code media',
        'table contextmenu directionality textcolor colorpicker'
    ],
    toolbar1: 
        'styleselect | bold italic underline subscript superscript strikethrough removeformat | forecolor backcolor | ' + 
        'fontselect | bullist numlist | alignleft aligncenter alignright alignjustify | table | ' + 
        'link unlink image hr | code',
    toolbar_items_size: 'small',
    style_formats: [
        { title: 'Header 1', block: 'h1' }, { title: 'Header 2', block: 'h2' }, { title: 'Header 3', block: 'h3' },
        { title: 'Header 4', block: 'h4' }, { title: 'Header 5', block: 'h5' }, { title: 'Header 6', block: 'h6' }
    ],
    allow_conditional_comments: false,
    valid_elements: 'a,div,h1,h2,h3,h4,h5,h6,hr,li,ol,p,span[style],sub,sup,table[*],tr[*],td[*],ul,-p',
    extended_valid_elements : 'a[href|target=_blank],img[src|alt|width|height]',
    content_css: [],
    setup: function (editor) {
        // update selector's value when changes are made
        editor.on('change', editor.save);
    }
});

When the form gets submitted, the description field gets sanitized using HTMLPurifier.

Below are my configurations for HTMLPurifier:

$config = HTMLPurifier_Config::createDefault();
$config->set('Core.Encoding', 'UTF-8');
$config->set('HTML.ForbiddenElements', array('applet','embed','iframe','link','script','style','object'));
$config->set('AutoFormat.RemoveEmpty', true);
$config->set('Core.RemoveInvalidImg', true);
$config->set('URI.AllowedSchemes', array('data' => true)); // allow data URIs
$purifier = new HTMLPurifier($config);

When data is entered in the description, it is possible to have nested span tags. For example:

<h1><span style="text-decoration: underline; color: #ff6600;"><span style="font-weight: bold; font-style: italic;">sddfdsdfdhjhjkhjkh</span></span></h1>

Question: Is there a way to clean up the HTML (using either TinyMCE or HTMLPurifier) so that e.g. styles are collapsed whenever it is possible?

<h1><span style="text-decoration: underline; color: #ff6600; font-weight: bold; font-style: italic;">sddfdsdfdhjhjkhjkh</span></h1>

Or better:

<h1 style="text-decoration: underline; color: #ff6600; font-weight: bold; font-style: italic;">sddfdsdfdhjhjkhjkh</h1>

Solution

  • As you got another answer that it is not possible using HTML Purifier for that.

    But it is still possible to make a helper function to do what you want.

    By using preg_replace and regex we can create following function that will remove spans and get the output you asked for:

    function filterSpan($content)
    {
        return preg_replace('/(><span)|(<\/span>)/', '', $content);
    }
    

    This is your unfiltered input example:

    $content = '
    <h1><span style="text-decoration: underline; color: #ff6600; 
    font-weight: bold; font-style: italic;">sddfdsdfdhjhjkhjkh</span></h1>
    ';
    

    And here is the output after calling filterSpan($content):

    <h1 style="text-decoration: underline; color: #ff6600; 
    font-weight: bold; font-style: italic;">sddfdsdfdhjhjkhjkh</h1>