Search code examples

PHP Simple HTML DOM Parser: Get all posts

I'd like to get all articles from the webpage, as well as get all pictures for the each article.

I decided to use PHP Simple HTML DOM Parse and I used the following code:



$sitesToCheck = array(
        'url' => '',
        'search_element' => 'h2.title a',
        'get_element' => ''
        // 'url' => '',            // Site address with a list of of articles
        // 'search_element' => '', // Link of Article on the site
        // 'get_element' => ''     // desired content

$s = microtime(true);

foreach($sitesToCheck as $site)
    $html = file_get_html($site['url']);

    foreach($html->find($site['search_element']) as $link)
        $content   = '';
        $savePath  = 'cachedPages/'.md5($site['url']).'/';
        $fileName  = md5($link->href);

        if ( ! file_exists($savePath.$fileName))
            $post_for_scan = file_get_html($link->href);

            foreach($post_for_scan->find($site["get_element"]) as $element)
                $content .= $element->plaintext . PHP_EOL;

            if ( ! file_exists($savePath) && ! mkdir($savePath, 0, true))
                die('Unable to create directory ...');

            file_put_contents($savePath.$fileName, $content);

$e = microtime(true);

echo $e-$s;

I will try to get only articles without pictures. But I get the response from the server

"Maximum execution time of 120 seconds exceeded"


What I'm doing wrong? Is there any other way to get all the articles and all pictures for each article for a specific webpage?


  • I had similar problems with that lib. Use PHP's DOMDocument instead:

    $doc = new DOMDocument;
    $links = $doc->getElementsByTagName('a');
    foreach ($links as $link) {
      doSomethingWith($link->getAttribute('href'), $link->nodeValue);
