Search code examples
phphtmlparsingweb-scrapingsimple-html-dom

Junk Javascript and Css code with simple html dom parser


I am using simple html dom parser to parse a link with php. Below the url and php code which I am using.

URL:

https://homeshopping.pk/products/-Imported-Stretchable-Tights-For-Women--Pack-Of-3-.html

PHP Script:

$html = file_get_html('https://homeshopping.pk/products/-Imported-Stretchable-Tights-For-Women--Pack-Of-3-.html');

foreach($html->find('div#ProductDescription_Tab') as $description)
{
    $comments = $description->find('.hsn_comments', 0); 
      $comments->outertext = ''; 

     print $description->outertext ;

}

The problem is that after running the script I am getting the front end as I Want but viewing page source shows a lot of javascript and css junk code. Is it ok? Cant I get only the html tags without any extra css or javascript code?. Below are the images of my php script view page source after running the script.

https://i.sstatic.net/78X6z.jpg


Solution

  • if you are using the latest version of simpleHTMLDom,you can use the remove() function. here is a sample code based on your existing code

    $html = file_get_html('https://homeshopping.pk/products/-Imported-Stretchable-Tights-For-Women--Pack-Of-3-.html');
    
    foreach($html->find('div#ProductDescription_Tab') as $description)
    {
        $comments = $description->find('.hsn_comments', 0); 
          $comments->outertext = ''; 
        //remove div with script 
        $description->find('div#flix-minisite',0)->remove();
        $description->find('div#flix-inpage',0)->remove();
    
        //will remove all <script> tags
        foreach($description->find('script') as $s) $s->remove();
    
        //wil remove all <style> tags
        foreach($description->find('style') as $s) $s->remove();
         echo $description->innertext ;
    
    }