I have the same situation as this this guy.
Basically strip_tags
removes tags including broken tags (the term used in the documentation). Is there another way of doing this that doesn't involve removing <
and any text after it if it's not an HTML tag?
I'm currently doing this:
$description = "<p>I am currently <30 years old.</p>";
$body = strip_tags(html_entity_decode($description, ENT_QUOTES, "UTF-8"), "<strong><em><u>");
echo $body;
But the code above will break something like:
<p>I am currently <30 years old.</p>
Into:
I am currently
Here's an eval.in so you guys could see what I mean.
The HTML you have as input is invalid. So that needs fixing. You could replace all those unclosed <
by <
first, and then do your html_entity_decode
after strip_tags
:
$description = "<p>I am currently <30 years old.</p>";
$description = preg_replace("/<([^>]*?(?=<|$))/", "<$1", $description);
$body = html_entity_decode(strip_tags($description, "<strong><em><u>"),
ENT_NOQUOTES, "UTF-8");
echo $body;
See it on paiza.io
Alternatively you could use a DOM parser, which in some cases could give better results, but you'll still need to apply the fix first:
$description = "<p>I am currently <30 years old.</p>";
$description = preg_replace("/<([^>]*?(?=<|$))/", "<$1", $description);
$doc = new DOMDocument();
$doc->loadHTML($description);
$body = $doc->documentElement->textContent;
echo $body;
See it on paiza.io