Search code examples
phphtml-parsingdomdocumenthtml-manipulation

Using PHP, how do I remove HTML Text After/Before Certain Number of <br>


Using PHP, how can I remove HTML text that is placed before/after a certain number of <br> tags?

For example, I have this,

<div>
    <div><img sec=""></div>
    <br>
    <h3>title</h3>
    <span>some text here</span>
    <br>
    Some text that I want to remove.
    <br>
    <br>
</div>

I'd like to remove the string before the last two <br> tags. Or It could be said after the second <br>.

I tried explode() with <br> and omitted the last two array elements with array_push(). However, I had to add </div> to close the outer tag. When the outer tag dynamically changes, it's not a good idea.

Does anybody have a solution for this?


Solution

  • Okey, this is what I've achieved. Although this might not be the most efficient way but I'll share. I used DOMinnerHTML() introduced here and preg_split(). This removes the text after the last three <br> tags.

    <?php 
    $html = <<<STR
    <div>
        <div><img sec=""></div>
        <br>
        <h3>title</h3>
        <span>some text here</span>
        <br>
        Some text that I want to remove.
        <br>
        <br>
    </div>
    STR;
    
    $doc = new DOMDocument;
    $doc->loadHTML($html);
    $node = $doc->getElementsByTagName('div')->item(0);
    $innerHtml = DOMinnerHTML($node);
    $arrHtml = preg_split('/<br.*?\/?>/i', $innerHtml);     // devide the string into arrays by <br> or <br />
    array_splice($arrHtml, -3);     // remove the last three elements   
    $edited = implode(" ", $arrHtml);
    
    echo $edited;
    
    function DOMinnerHTML($element) 
    { 
        $innerHTML = ""; 
        $children = $element->childNodes; 
        foreach ($children as $child) 
        { 
            $tmp_dom = new DOMDocument(); 
            $tmp_dom->appendChild($tmp_dom->importNode($child, true)); 
            $innerHTML.=trim($tmp_dom->saveHTML()); 
        } 
        return $innerHTML; 
    } 
    ?>