Disable Scrapyd item storing in .jl feed


I want to know how to disable Item storing in scrapyd.

What I tried

I deploy a spider to the Scrapy daemon Scrapyd. The deployed spider stores the spidered data in a database. And it works fine.

However Scrapyd logs each scraped Scrapy item. You can see this when examining the scrapyd web interface. This item data is stored in ..../items/<project name>/<spider name>/<job name>.jl

I have no clue how to disable this. I run scrapyd in a Docker container and it uses way too much storage.

I have tried suppress Scrapy Item printed in logs after pipeline, but this does nothing for scrapyd logging it seems. All spider logging settings seem to be ignored by scrapyd.

Edit I found this entry in the documentation about Item storing. It seems if you omit the items_dir setting, item logging will not happen. It is said that this is disabled by default. I do not have a scrapyd.conf file, so item logging should be disabled. It is not.


  • After writing my answer I re-read your question and I see that what you want has nothing to do with logging but it's about not writing to the (default-ish) .jl feed (Maybe update the title to: "Disable scrapyd Item storing"). To override scrapyd's default, just set FEED_URI to an empty string like this:

    $ curl http://localhost:6800/schedule.json -d project=tutorial -d spider=example -d setting=FEED_URI=

    For other people who are looking into logging... Let's see an example. We do the usual:

    $ scrapy startproject tutorial
    $ cd tutorial
    $ scrapy genspider example

    then edit tutorial/spiders/ to contain the following:

    import scrapy
    class TutorialItem(scrapy.Item):
        name = scrapy.Field()
        surname = scrapy.Field()
    class ExampleSpider(scrapy.Spider):
        name = "example"
        start_urls = (
        def parse(self, response):
            for i in xrange(100):
                t = TutorialItem()
                t['name'] = "foo"
                t['surname'] = "bar %d" % i
                yield t

    Notice the difference between running:

    $ scrapy crawl example
    # or
    $ scrapy crawl example -L DEBUG
    # or
    $ scrapy crawl example -s LOG_LEVEL=DEBUG


    $ scrapy crawl example -s LOG_LEVEL=INFO
    # or
    $ scrapy crawl example -L INFO

    By trying such combinations on your spider confirm that it doesn't print Item info for log-level beyond debug.

    It's now time, after you deploy to scrapyd to do exactly the same:

    $ curl http://localhost:6800/schedule.json -d setting=LOG_LEVEL=INFO -d project=tutorial -d spider=example

    Confirm that the logs don't contain items when you run:

    Note that if your items are still printed in INFO level, it likely means that your code or some pipeline is printing it. You could rise log-level further and/or investigate and find the code that prints it and remove it.