Search code examples

Data Scraping Problem

I am scraping data from facebook page for the wall posts, here is the url:!/GMHTheBook?v=wall&ref=ts

I sucessfully scraped all the visible wall posts using CURL.


At the end of visible wall posts, there is Older Posts link which shows more wall posts once you click on that link. Now how do I sort of manually click that link to show more wall posts and scrap those posts as well?

Any solution using any method for that? I am using CURL though but I hope there is just about any solution to deal with such situation?


Now I am using this code to get all the data, find the next link and fetch the data for that url and so on, here is the code:

ini_set('display_errors', true);

$data = json_decode(file_get_contents(($url)), true);

$names = array();
$stories = array();

foreach($data['data'] as $post)
    $names[] = $post['from']['name'];
    $stories[] = $post['message'];

$url = $data['paging']['next'];

// this is meant to scrap data recurssively from the next links
while($url !== '')
    $url = $data['paging']['next'];
    $data = json_decode(file_get_contents(($url)), true);

    foreach($data['data'] as $post)
        $names[] = $post['from']['name'];
        $stories[] = $post['message'];

    $url = urldecode($data['paging']['next']);
    echo $url . '<br />';

for($j = 0; $j < count($names); $j++)
  $data .= $names[$j] . '|' . $stories[$j] . "\n";

$h = fopen("data.txt", "a+");
fwrite($h, $data);

But the problem is that script keeps on running with no output at all, also no file is created. I have set the script time settings to higher value too. allow_url_fopen is also set to on. Is there anything wrong in the script or probably I am not doing the recurssion in the right way? Any solution/alternative to this?


  • You should use the Graph API. The data you are scraping is available in JSON format at

    and contains links for getting previous/next pages, e.g. paging.


    $data = json_decode(file_get_contents(($url)));
    foreach($data->data as $post) {
        echo $post->from->name, ': ',

    The above will output all the posts on the wall. For paging do

    echo $data->paging->previous;
    echo $data->paging->next;

    This will output two URLs. All you have to do is load them again.