The HTML looks like this:
$html = 'SOME TEXT<p style="border-top: 0.0px;border-right: 0.0px;vertical-align: baseline;border-bottom: 0.0px;color: #000000;padding-bottom: 0.0px;padding-top: 0.0px;padding-left: 0.0px;margin: 0.0px;border-left: 0.0px;padding-right: 0.0px;background-color: #ffffff;">SOME TEXT';
I tried strip_tags($html, '<p>');
to remove everything except for <p>
but that preserves all the style elements of the tag.
I want the above to be replaced with just <p>
What's the best approach?
Thanks!
The simplest solution for this would be something based on preg_replace()
.
$html = 'SOME TEXT<p style="border-top: 0.0px;border-right: 0.0px;vertical-align: baseline;border-bottom: 0.0px;color: #000000;padding-bottom: 0.0px;padding-top: 0.0px;padding-left: 0.0px;margin: 0.0px;border-left: 0.0px;padding-right: 0.0px;background-color: #ffffff;">SOME TEXT';
$html = strip_tags($html, '<p>');
$html = preg_replace('/\sstyle=["\'][A-Za-z0-9-:\s.;#]{1,}["\']/', '', $html);
As always, you should always be somewhat careful when trying to parse html with regex. For instance, this would fail if for some reason the text inside the <p />
tag contained something formatted like a css style. (Something like <p>If I typed style="color:red" inside the tags, it would also be removed</p>
)
The next step to make something like this better would be to actually parse the string as an XML document using the DOMDocument class. It depends on how robust a feature set you are looking to achieve. However, this method could change your string in unexpected ways; for instance, parsing your string as a DOMDocument would cause a </p>
tag to be added. That kind of validation may or may not be useful for you.