Search code examples
phpweb-scrapingcurl

Scrape a website URL to get the path of an image


I'm hacking together a simple php script that will build a list of photo albums I have on my Facebook fan page.

Facebook kindly offer the Graph API which gives me back a nice list of Albums, however they no longer provide the path of the default album image.

I want to write a PHP script that loads an album url via curl and somehow grab the path of the first image in the table that contains the thumbnails. This would be the "src" value of the first img tag that has a class of "UIPhotoGrid_Image".

The block of layout code that contains the good stuff looks like this:

<div id="album_container">
    <div class="UIPhotoGrid_Container UIPhotoGrid_DefaultPadding">
        <table class="UIPhotoGrid_Table" cellpadding="0" cellspacing="0">
            <tr>
                <td class="UIPhotoGrid_TableCell">
                    <a class="UIPhotoGrid_PhotoLink clearfix" href="http://www.facebook.com/photo.php?pid=5004658&amp;id=20785087272"><img class="UIPhotoGrid_Image img" src="http://photos-e.ak.fbcdn.net/hphotos-ak-snc4/hs080.snc4/35354_422883027272_20785087272_5004658_704231_s.jpg" onload="this.fb_loaded = true;" /></a>
                </td>
                <td class="UIPhotoGrid_TableCell">
                    <a class="UIPhotoGrid_PhotoLink clearfix" href="http://www.facebook.com/photo.php?pid=5004659&amp;id=20785087272"><img class="UIPhotoGrid_Image img" src="http://photos-c.ak.fbcdn.net/hphotos-ak-snc4/hs080.snc4/35354_422883032272_20785087272_5004659_6158094_s.jpg" onload="this.fb_loaded = true;" /></a>
                </td>
                <td class="UIPhotoGrid_TableCell">
                    <a class="UIPhotoGrid_PhotoLink clearfix" href="http://www.facebook.com/photo.php?pid=5004660&amp;id=20785087272"><img class="UIPhotoGrid_Image img" src="http://photos-f.ak.fbcdn.net/hphotos-ak-snc4/hs080.snc4/35354_422883037272_20785087272_5004660_1787119_s.jpg" onload="this.fb_loaded = true;" /></a>
                </td>
            </tr>
        </table>
    </div>
</div>

This sadly, is beyond my current coding capabilities... Any ideas?


Solution

  • You could use phpsimpledom to grab the path using a jQuery style syntax.

    Note: Facebook probably have several image clusters, so the URL to the photo may change over time.