Search code examples
phphtmlregexwordpresspreg-match-all

How to preg_match_all to get the text inside the tags "<h3>" and "<h3> <a/> </h3>"


Hello I am currently creating an automatic table of contents my wordpress web. My reference from https://webdeasy.de/en/wordpress-table-of-contents-without-plugin/

Problem : Everything goes well unless in the <h3> tag has an <a> tag link. It make $names result missing.

I see problems because of this regex section

preg_match_all("/<h[3,4](?:\sid=\"(.*)\")?(?:.*)?>(.*)<\/h[3,4]>/", $content, $matches);

// get text under <h3> or <h4> tag.
$names = $matches[2];

I have tried modifying the regex (I don't really understand this)

preg_match_all (/ <h [3,4] (?: \ sid = \ "(. *) \")? (?:. *)?> <a (. *)> (. *) <\ / a> <\ / h [3,4]> /", $content, $matches)

// get text under <a> tag.
$names = $matches[4];

The code above work for to find the text that is in the <h3> <a> a text </a> <h3> tag, but the h3 tag which doesn't contain the <a> tag is a problem.

My Question : How combine code above? My expectation is if when the first code result does not appear then it is execute the second code as a result.

Or maybe there is a better solution? Thank you.


Solution

  • Here's a way that will remove any tags inside of header tags

    $html = <<<EOT
    <h3>Here's an <a href="thing.php">alternative solution</a></h3> to using regex. <h3>It may <a name='#thing'>not</a></h3> be the most elegant solution, but it works
    EOT;
    
    preg_match_all('#<h(.*?)>(.*?)<\/h(.*?)>#si', $html, $matches);
    foreach ($matches[0] as $num=>$blah) {
       $look_for = preg_quote($matches[0][$num],"/");
       $tag = str_replace("<","",explode(">",$matches[0][$num])[0]);
       $replace_with = "<$tag>" . strip_tags($matches[2][$num]) . "</$tag>";
       $html = preg_replace("/$look_for/", $replace_with,$html,1);
    }
    
    echo "<pre>$html</pre>";