I'm using Curl via Proxies to download images with a scraper I have developed.
Unfortunately, it gets the odd image which looks like these and the last one is completely blank :/
Does anyone have a way to determine if the image has majority of greyness or is completely blank/white and these are indeed corrupted images?
I have done a lot of checking with other questions on here, but I haven't had much luck with other solutions. So please take care in suggesting this is a duplicate.
Thanks
After knowing about imgcolorat, I did a search and stumbled on some code. I came up with this:
<?php
$file = dirname(__FILE__) . "/images/1.jpg";
$img = imagecreatefromjpeg($file);
$imagew = imagesx($img);
$imageh = imagesy($img);
$xy = array();
$last_height = $imageh - 5;
$foo = array();
$x = 0;
$y = 0;
for ($x = 0; $x <= $imagew; $x++)
{
for ($y = $last_height;$y <= $imageh; $y++ )
{
$rgb = @imagecolorat($img, $x, $y);
$r = ($rgb >> 16) & 0xFF;
$g = ($rgb >> 8) & 0xFF;
$b = $rgb & 0xFF;
if ($r != 0)
{
$foo[] = $r;
}
}
}
$bar = array_count_values($foo);
$gray = (isset($bar['127']) ? $bar['127'] : 0) + (isset($bar['128']) ? $bar['128'] : 0) + (isset($bar['129']) ? $bar['129'] : 0);
$total = count($foo);
$other = $total - $gray;
if ($gray > $other)
{
echo "image corrupted \n";
}
else
{
echo "image not corrupted \n";
}
?>
Anyone see some potential pitfalls with this? I thought about getting the last few rows of the image and then comparing the total of r 127,128,129 (which are gray) against the total of other colours. If gray is greater than the other colours then the image is surely corrupted.
Opinions welcome! :)
If the image it is returning is a valid file, then I would recommend running the scrape twice (ie. download it twice and check to see if they are the same).
Another option would be to check the last few pixels of the image (ie. bottom-right corner) to see if they match that color of grey exactly. If they do, then redownload. (obviously this approach fails if you download an image that is actually supposed to be grey in that corner, in that exact colour...but if you check several of the last pixels it should reduce the chance of that to an acceptable level).