Search code examples
imagedownloadscrapysize

scrapy get image size without downloading


I want to get image size without download, is it possible?

image1 url:https://koctas-img.mncdn.com/mnresize/600/600/productimages/1000599303/1000599303_1_MC/8843182866482_1663925809606.jpg

def parse_product(self, response):

    images = response.css(".swiper-slide::attr(data-large)").getall()
    image1 = images[0]
    image_size=yield Request(image1, method="HEAD", callback=self.callback)

Solution

  • You can use the HEAD method.

    import scrapy
    
    
    class ExampleSpider(scrapy.Spider):
        name = 'example_spider'
    
        def start_requests(self):
            images_urls = [
                'http://wallpapercave.com/wp/wp1809904.jpg',
                'https://i2.wp.com/www.otakutale.com/wp-content/uploads/2015/10/One-Punch-Man-Anime-Magazine-Visual-01.jpg',
                'https://thedeadtoons.com/wp-content/uploads/2020/06/One-Punch-Man-Season-3.jpg'
            ]
            for url in images_urls:
                yield scrapy.Request(url=url, method='HEAD')
    
        def parse(self, response, **kwargs):
            yield {
                'Content-Length': response.headers['Content-Length']
            }
    

    Output:

    [scrapy.core.scraper] DEBUG: Scraped from <200 https://thedeadtoons.com/wp-content/uploads/2020/06/One-Punch-Man-Season-3.jpg>
    {'Content-Length': b'179681'}
    [scrapy.core.scraper] DEBUG: Scraped from <200 https://i2.wp.com/www.otakutale.com/wp-content/uploads/2015/10/One-Punch-Man-Anime-Magazine-Visual-01.jpg>
    {'Content-Length': b'1847153'}
    [scrapy.core.scraper] DEBUG: Scraped from <200 https://wallpapercave.com/wp/wp1809904.jpg>
    {'Content-Length': b'246144'}