I need to prevent the loading of images, css files, themes, and scripts from a web page. In a basic scraping with scrapy
There is some way to block them from setting.py or another?
import scrapy
class MySpyder(scrapy.Spider):
name = 'Spiderr'
start_urls = [l.strip() for l in open("Archive").readlines()]
def parse(self,response):
tittle = response.xpath("/html/body/").get('').strip()
url = response.url
yield {
'tittle': tittle,
'URL': url,
}
I guess that will make the website suffer less
Scrapy use source code only in response
you can check using response.text
JS rendering is what you are referring about which scrapy do not apply.
If you want to hit server less you need to add time delay
and decrease concurrent requests
from settings.py