Search code examples
phphtmlattributessimple-html-dom

PHP simple html DOM remove all attributes from an html tag


$html = file_get_html('page.php');

foreach($html->find('p') as $tag_name) 
    {
        $attr = substr($tag_name->outertext,2,strpos($tag_name->outertext, ">")-2);
        $tag_name->outertext = str_replace($attr, "", $tag_name->outertext);        
    }
echo $html->innertext;

Above is the code I wrote to take what's inside all <p> tags in my html page and remove them.


My html code is similar to this :

<p class="..." id = "..." style = "...">some text...</p>
<p class="..." id = "..." style = "...">some text...</p>
<p class="..." id = "..." style = "...">some text...</p>
  <font>
    <p class="..." id = "..." style = "...">some text ...</p>
    <p class="..." id = "..." style = "...">some text ...</p>
  </font>
<p class="..." id = "..." style = "...">some text...</p>


If I run the php code , result would be this :

<p>some text...</p>
<p>some text...</p>
<p>some text...</p>
  <font>
    <p class="..." id = "..." style = "...">some text ...</p>
    <p class="..." id = "..." style = "...">some text ...</p>
  </font>
<p>some text...</p>

It doesn't remove <p> tags attributes that are inside <font>.
If anyone can help me with this I'll be appreciate.


Solution

  • When I use your code and example HTML, it does remove all the attributes from all the <p> tags, even the ones inside <font>, so I'm not sure why yours isn't working.

    But it looks like simplehtmldom has methods that specifically deal with attributes so you don't have to use string functions:

    $html = file_get_html('page.php');
    
    
    foreach($html->find('p') as $p) {
        foreach ($p->getAllAttributes() as $attr => $val) {
            $p->removeAttribute($attr);
        }    
    }
    echo $html->innertext;
    

    Hopefully that will be more effective.