My client is writing blogs on Sina Blog and she is only comfortable with its editor. So after she submit a blog I use a small snippet to scrape the images and texts to her own blog website. Its core is
$url = 'http://s5.sinaimg.cn/bmiddle/001MEJWgzy7xxRaXmDyd4&690';
$img_data = @file_get_contents($url);
$img = file_put_contents('1.jpg',$img_data);
As weird as it sounds, it did work very well and saved us both tons of time. But recently the images became all blank with some watermarks. I guess Sina finally detected our little dirty trick and block the images from being scraped. I am just curious how the block is conducted and more importantly, is there anyway to work around? I've tried using wget http://s5.sinaimg.cn/bmiddle/001MEJWgzy7xxRaXmDyd4&690
it can also only get the blank image.
Just a suggestion - the easiest (and the most likely) way a site would go about detecting a scraper is by looking at the request headers, most commonly "Accept", "Referrer" and "User-Agent". You could try copying the values that your "real" browser sends and plugging them into the wget call, like so:
Hope that helps!