Search code examples
phphtmlob-startdomparser

Remove HTML using PHP (ob_start + dom parser)


I need to learn how to remove html tags using PHP.

This is what I have in mind (I think DOM phrasing is what I need but I cant figure out how it works. A working example would be a big help for me. I can't install any external library’s and I am running PHP 5):

function the_remove_function($remove){

//  dom parser code here?

return $remove;}

// return all content into a string
ob_start('the_remove_function');

Example code:

<body>
<div class="a"></div>
<div id="b"><p class="c">Here are some text and HTML</p></div>
<div id="d"></div>
</body>

Questions:

1) How do I return:

<body>
<p class="c">Here are some text and HTML</p>
</body>

2) How do I return:

<body>
<div class="a"></div>
<div id="b"></div>
<div id="d"></div>
</body>

3) How do I return:

<body>
<div class="a"></div>
<p class="c">Here are some text and HTML</p>
<div id="d"></div>
</body>

Next example code:

<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<link rel='stylesheet' id='test-css'  href='http://www.domain.com/css/test.css?ver=2011' type='text/css' media='all' />
<script type='text/javascript' src='http://www.domain.com/js/test.js?ver=2010123'></script>
</head>

4) How do I return:

<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<link rel='stylesheet' id='test-css'  href='http://www.domain.com/css/test.css?ver=2011' type='text/css' media='all' />
</head>

5) How do I return:

<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<script type='text/javascript' src='http://www.domain.com/js/test.js?ver=2010123'></script>
</head>

Thanks for reading :)


Solution

  • You can use all the DOM classes of PHP, you will the doc here : https://www.php.net/manual/en/book.dom.php and I'm sur you can find a lot of tutorial in you prefer.

    Here is an exemple for your second case :

    <?php
    $content = '<body><div class="a"></div><div id="b"><p class="c">Here are some text and HTML</p></div><div id="d"></div></body>';
    $doc = new DOMDocument();
    $doc->loadXML($content);
    
    //Get your p element
    $p = $doc->getElementsByTagName('p')->item(0);
    //Remove the p tag from the DOM
    $p->parentNode->removeChild($p);
    
    //Save you new DOM tree
    $html = $doc->saveXML();
    
    echo $html;
    //If you want to delete the first line
    echo substr($html, strpos($html, "\n"));