Search code examples
cssimagescriptingscrapyblock

Don't load images, css or themes, and scripts in scrapy


I need to prevent the loading of images, css files, themes, and scripts from a web page. In a basic scraping with scrapy

There is some way to block them from setting.py or another?

import scrapy

class MySpyder(scrapy.Spider):
    name = 'Spiderr'
    start_urls = [l.strip() for l in open("Archive").readlines()]

    def parse(self,response):
        tittle = response.xpath("/html/body/").get('').strip()
        url = response.url
        yield {
            'tittle': tittle,
            'URL': url,
        }

I guess that will make the website suffer less


Solution

  • Scrapy use source code only in response

    you can check using response.text

    JS rendering is what you are referring about which scrapy do not apply.

    If you want to hit server less you need to add time delay and decrease concurrent requests from settings.py