Search code examples
javascriptphpjqueryhtmlsimple-html-dom

PHP Simple HTML DOM: How Do I Find Urls Exist In Javascript


PHP: i am parsing some pages using Simple Html Dom Parser, I have done lot the work already, but stuck at a point.

How do i get url that resides in a javascript function. (Urls are random) Like this

<script> 
    lstImages = array();   
    lstImages.push("abc.com/123873.php");
    lstImages.push("abc.com/125673.php");
</script>

How am i supposed to get them? They can be random in count, some pages have 20, some 25 and so on.

Help will be appreciated, i am exhausted already working with this thing.

Sample Code:

require "simple_html_dom.php";
$html = file_get_html('pages.html');

$file = fopen("links.txt","w");
foreach($html->find('a') as $link) {
    echo fwrite($file,$link->href."\n");
}

Solution

  • Since you can't use a DOM parser to scrape Javascript, you can do this with Regular Expressions.

    Use this:

    $html = file_get_html('pages.html');
    
    $re = "/push\\(\"(.*)\"\\)/"; 
    $str = $html;
    
    preg_match_all($re, $str, $matches);
    

    $matches now contains an array with your URL's.