Search code examples
phpscreen-scrapinghtml-parsingsimple-html-dom

HTML DOM Parser - How to get the first post of all topics in a forum


I was trying to scrap the first post of every topics in sitepoint javascript forum. But The DOM Parser would give me ALL THE POSTS OF EVERY TOPICS IN SITE POINT JAVASCRIPT FORUM. Maybe I didn't traverse the DOM correctly ? Below is my code:

<?php

class Sitepoint extends Controller
{
    public function index()
    {
        $this->load->helper('dom');
        header('Content-Type: text/html; charset=utf-8');
        echo '<ol>';

            $html = file_get_html('http://www.sitepoint.com/forums/javascript-15');

            foreach($html->find('a[id^="thread_title"]') as $topic) {
                $post =$topic->href;
                $posthtml = file_get_html($post);
                $posthtml->find('div[id^="post_message"]', 0);
                echo'<li>';
                echo $topic->plaintext.'<br>';
                echo $posthtml->plaintext.'<br>';
                echo'</li>';
            }
        echo '</ol>';
    }
}

Solution

  • You forgot to assign the result of $posthtml->find to a variable:

    foreach($html->find('a[id^="thread_title"]') as $topic) {
        $post =$topic->href;
        $posthtml = file_get_html($post);
        $posttext = $posthtml->find('div[id^="post_message_"]', 0);
        echo'<li>';
        echo $topic->plaintext.'<br>';
        echo $posttext->plaintext.'<br>';
        echo'</li>';
    }