Search code examples
phpregexweb-scrapingtwitterfile-get-contents

Using PHP to scrape image url from twitter page


I'm trying to scrape an image url from twitter e.g. 'https://pbs.twimg.com/media/BGZHCHwCEAACJ19.jpg:large' using php. I have found the following php code and file_get_contents is working but I don't think the regurlar expression is matching the url. Can you help debug this code? Thanks in advance.

Here is a snippet from twitter which contains the image:

<div class="media-gallery-image-wrapper">
     <img class="large media-slideshow-image" alt="" src="https://pbs.twimg.com/media/BGZHCHwCEAACJ19.jpg:large" height="480" width="358">
 </div>

Here is the php code:

<?php
    $url = 'http://t.co/s54fJgrzrG';
    $twitter_page = file_get_contents($url);
    preg_match('/(http:\/\/p.twimg.com\/[^:]+):/i', $twitter_page, $matches);
    $imgURL = array_pop($matches); 
    echo $imgURL;
?>

Solution

  • Something like this should provide a URL.

    <?php
        $url = 'http://t.co/s54fJgrzrG';
        $twitter_page = file_get_contents($url);
        preg_match_all('!http[s]?:\/\/pbs\.twimg\.com\/[^:]+\.(jpg|png|gif)!i', $twitter_page,$matches);
        echo $img_url=$matches[0][0];
    ?>
    

    Response is

    https://pbs.twimg.com/media/BGZHCHwCEAACJ19.jpg