Search code examples
phpweb-scrapingcurlipaduser-agent

How to scrape a site using a User-Agent for Ipad?


How can I scrape a site using a User-Agent for Ipad?

I have this code below using curl in PHP which outputs the source but can't find the tags still. On Ipad or Safari browser using an Ipad User-Agent, the tags displays when the site is loaded.

Thanks!

<?php
    $useragent= "Mozilla/5.0 (iPad; U; CPU OS 3_2 like Mac OS X; en-us) AppleWebKit/531.21.10 (KHTML, like Gecko) Version/4.0.4 Mobile/7B334b Safari/531.21.10')";

    $ch = curl_init ("http://www.cbsnews.com/video/watch/?id=7370279n&tag=mg;mostpopvideo");

    curl_setopt ($ch, CURLOPT_USERAGENT, $useragent); // set user agent
    curl_setopt ($ch, CURLOPT_RETURNTRANSFER, true);
    // curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
    echo $output = curl_exec ($ch);

    curl_close($ch);
?>

Solution

  • Try using curl from the command line, with a perl script such as this:

    my $ua = "Mozilla/5.0 (iPad; U; CPU OS 3_2 like Mac OS X; en-us) AppleWebKit/531.21.10 (KHTML, like Gecko) Version/4.0.4 Mobile/7B334b Safari/531.21.10";
    my $curl = "curl -A '$ua'";
    my $server = "http://www.cbsnews.com";
    my $startpage = "$server/video/watch/?id=7370279n&tag=mg;mostpopvideo";
    my $path = "/path/to/download/to";
    open(f, "$curl -L $startpage |") or die "Cannot open website: $!";
    while (<f>)
    {
        if (/<a\s+[^>]*href=\"$server\/([^\"\/])*\"/)
        {
            my $file = $2;
            system("$curl -e $startpage $server/$file > $path/$file");
            next;
        }
    
        if (/<a\s+[^>]*href=\"$server\/([^\"]+)\/([^\"\/])*\"/)
        {
            my $folder = $1;
            my $file = "$folder/$2";
            system("mkdir -p $path/$folder");
            system("$curl -e $startpage $server/$file > $path/$file");
            next;
        }
    }
    close(f);