Search code examples
phphtmlweb-scraping

PHP Scrape html page content with tags


The code from original web page

<html>
<div class="clear"></div>
<div class="slider">
<ul>  
<li>
<a title="title1" href="http://www.link.com" >
<img  title="title1"  alt=""  src="http://www.link.com/1.jpg"  /></a>
</li>
<li>
<a title="title2" href="http://www.link.com" >
<img  title="title2"  alt=""  src="http://www.link.com/2.jpg"  /></a>
</li>
</ul>
</div>
<div class="clear"></div>
</html>

and I want to extract below details (with tags) same as below

<div class="slider">
<ul>  
<li>
<a title="title1" href="http://www.link.com" >
<img  title="title1"  alt=""  src="http://www.link.com/1.jpg"  /></a>
</li>
<li>
<a title="title2" href="http://www.link.com" >
<img  title="title2"  alt=""  src="http://www.link.com/2.jpg"  /></a>
</li>
</ul>
</div>

Checked many questions posted before but I can't find something similar to this so Please someone help me with this.

Thanks


Solution

  • To manipulate HTML it is better not to use regular expressions like preg_replace. Why? See this question. You can use phpQuery as HTML parser.

    Install it with the following commands (you need pear):

    pear channel-discover phpquery-pear.appspot.com  
    pear install phpquery/phpQuery 
    

    After installing, you can do the following:

    <?php
    $html = file_get_contents("http://www.your-url.com/");
    $pq = phpQuery::newDocumentHTML($html);
    echo $pq['.slider']; // Output the contents of tags with class="slider"
    

    For more example code and documentation, please take at the phpQuery web page

    Edit:

    If you want to use another HTML parsing solution, you can take a look at How do you parse and process HTML/XML in PHP?