Search code examples
phpstrip-tags

PHP HTML strip_tags all except some and remove styling from within tag


The HTML looks like this:

$html = 'SOME TEXT<p style="border-top: 0.0px;border-right: 0.0px;vertical-align: baseline;border-bottom: 0.0px;color: #000000;padding-bottom: 0.0px;padding-top: 0.0px;padding-left: 0.0px;margin: 0.0px;border-left: 0.0px;padding-right: 0.0px;background-color: #ffffff;">SOME TEXT';

I tried strip_tags($html, '<p>'); to remove everything except for <p> but that preserves all the style elements of the tag.

I want the above to be replaced with just <p>

What's the best approach?

Thanks!


Solution

  • The simplest solution for this would be something based on preg_replace().

    $html = 'SOME TEXT<p style="border-top: 0.0px;border-right: 0.0px;vertical-align: baseline;border-bottom: 0.0px;color: #000000;padding-bottom: 0.0px;padding-top: 0.0px;padding-left: 0.0px;margin: 0.0px;border-left: 0.0px;padding-right: 0.0px;background-color: #ffffff;">SOME TEXT';
    $html = strip_tags($html, '<p>');
    $html = preg_replace('/\sstyle=["\'][A-Za-z0-9-:\s.;#]{1,}["\']/', '', $html);
    

    As always, you should always be somewhat careful when trying to parse html with regex. For instance, this would fail if for some reason the text inside the <p /> tag contained something formatted like a css style. (Something like <p>If I typed style="color:red" inside the tags, it would also be removed</p>)

    The next step to make something like this better would be to actually parse the string as an XML document using the DOMDocument class. It depends on how robust a feature set you are looking to achieve. However, this method could change your string in unexpected ways; for instance, parsing your string as a DOMDocument would cause a </p> tag to be added. That kind of validation may or may not be useful for you.