Search code examples
pythonscrapyexport

Scrapy - Setting Feed Exporter Overwrite to True


I developed a Scrapy spider and I want to execute it without using the command line. That's why I use CrawlerProcess. I also want the output to be saved to a json file. Feed exporters are perfect in my case and the only way I could get them to work is by updating the settings this way:

from scraper.spiders.pp_spider import ConverterSpider
import scrapy
from scrapy.crawler import CrawlerProcess
from scrapy.utils.project import get_project_settings

settings = get_project_settings()
settings.set('FEED_FORMAT', 'json')
settings.set('FEED_URI', 'result.json')

process = CrawlerProcess(settings)

process.crawl(ConverterSpider)
process.start()

Now I would like to overwrite the output file result.json whenever a new crawl is executed. The way you would usually do it doesn't work with CrawlerProcess (example):

FEEDS = { 
    'result.json': {'format': 'json', 'overwrite': True} 
}

I would like to know how to do something like:

settings.set('overwrite', True)

Solution

  • My bad, the solution is pretty simple (as wRAR suggested):

    settings = get_project_settings()
    
    settings['FEEDS'] = { 
        'result.json': {'format': 'json', 'overwrite': True} 
    }
    
    process = CrawlerProcess(settings)