I developed a Scrapy spider and I want to execute it without using the command line. That's why I use CrawlerProcess
. I also want the output to be saved to a json file. Feed exporters are perfect in my case and the only way I could get them to work is by updating the settings this way:
from scraper.spiders.pp_spider import ConverterSpider
import scrapy
from scrapy.crawler import CrawlerProcess
from scrapy.utils.project import get_project_settings
settings = get_project_settings()
settings.set('FEED_FORMAT', 'json')
settings.set('FEED_URI', 'result.json')
process = CrawlerProcess(settings)
process.crawl(ConverterSpider)
process.start()
Now I would like to overwrite the output file result.json whenever a new crawl is executed. The way you would usually do it doesn't work with CrawlerProcess (example):
FEEDS = {
'result.json': {'format': 'json', 'overwrite': True}
}
I would like to know how to do something like:
settings.set('overwrite', True)
My bad, the solution is pretty simple (as wRAR suggested):
settings = get_project_settings()
settings['FEEDS'] = {
'result.json': {'format': 'json', 'overwrite': True}
}
process = CrawlerProcess(settings)