Search code examples
phpsimple-html-dom

simple_html_dom parser : filter found tag


I would like to parse code with structure like this :

<p class=class1>
 <i>.</i>
 <b>..</b>
<a class=class2></a>
</p>

and i need to get whole content of <p> but only without <a> tags with. I need to keep there all the other tags like <i> or <b> How can i do it?

Now i have just this code :

 $content = $page->find('p[class=class1]');
 foreach($content as $text)
  {
    $inner=$text->innertext();
  }    

it´s able to find whole content with <a> tags. And


Solution

  • You could loop the child nodes and check the nodeName(). If that is an a, you could set the outertext to an empty string:

    Try it like this:

    $data = <<<DATA
    <p class=class1>
     content
     <div>test</div>
     <i>.</i>
     <b>..</b>
    <a class=class2></a>
    </p>
    DATA;
    $html = str_get_html($data);
    
    foreach($html->find('p.class1') as $element) {
        foreach ($element->children as $child) {
            if ($child->nodeName() === "a") {
                $child->outertext = '';
            }
        }
    }
    
    echo $html->save();
    

    That would give you:

    <p class=class1> content <div>test</div> <i>.</i> <b>..</b> </p>

    Or if you want to remove all (nested) anchors:

    foreach ($html->find('p.class1 a') as $element) {
        $element->outertext = '';
    }