Search code examples
phphtmlpreg-replacemarkdownmarkup

preg_replace only OUTSIDE tags ? (... we're not talking full 'html parsing', just a bit of markdown)


What is the easiest way of applying highlighting of some text excluding text within OCCASIONAL tags "<...>"?

CLARIFICATION: I want the existing tags PRESERVED!

$t = 
preg_replace(
  "/(markdown)/",
  "<strong>$1</strong>",
"This is essentially plain text apart from a few html tags generated with some
simplified markdown rules: <a href=markdown.html>[see here]</a>");

Which should display as:

"This is essentially plain text apart from a few html tags generated with some simplified markdown rules: see here"

... BUT NOT MESS UP the text inside the anchor tag (i.e. <a href=markdown.html> ).

I've heard the arguments of not parsing html with regular expressions, but here we're talking essentially about plain text except for minimal parsing of some markdown code.


Solution

  • Actually, this seems to work ok:

    <?php
    $item="markdown";
    $t="This is essentially plain text apart from a few html tags generated 
    with some simplified markdown rules: <a href=markdown.html>[see here]</a>";
    
    //_____1. apply emphasis_____
    $t = preg_replace("|($item)|","<strong>$1</strong>",$t);
    
    // "This is essentially plain text apart from a few html tags generated 
    // with some simplified <strong>markdown</strong> rules: <a href=
    // <strong>markdown</strong>.html>[see here]</a>"
    
    //_____2. remove emphasis if WITHIN opening and closing tag____
    $t = preg_replace("|(<[^>]+?)(<strong>($item)</strong>)([^<]+?>)|","$1$3$4",$t);
    
    // this preserves the text before ($1), after ($4) 
    // and inside <strong>..</strong> ($2), but without the tags ($3)
    
    // "This is essentially plain text apart from a few html tags generated
    // with some simplified <strong>markdown</strong> rules: <a href=markdown.html>
    // [see here]</a>"
    
    ?>
    

    A string like $item="odd|string" would cause some problems, but I won't be using that kind of string anyway... (probably needs htmlentities(...) or the like...)