Disable Scrapyd item storing in .jl feed

Question

I want to know how to disable Item storing in scrapyd.

What I tried

I deploy a spider to the Scrapy daemon Scrapyd. The deployed spider stores the spidered data in a database. And it works fine.

However Scrapyd logs each scraped Scrapy item. You can see this when examining the scrapyd web interface. This item data is stored in ..../items/<project name>/<spider name>/<job name>.jl

I have no clue how to disable this. I run scrapyd in a Docker container and it uses way too much storage.

I have tried suppress Scrapy Item printed in logs after pipeline, but this does nothing for scrapyd logging it seems. All spider logging settings seem to be ignored by scrapyd.

Edit I found this entry in the documentation about Item storing. It seems if you omit the items_dir setting, item logging will not happen. It is said that this is disabled by default. I do not have a scrapyd.conf file, so item logging should be disabled. It is not.

Solution

After writing my answer I re-read your question and I see that what you want has nothing to do with logging but it's about not writing to the (default-ish) .jl feed (Maybe update the title to: "Disable scrapyd Item storing"). To override scrapyd's default, just set FEED_URI to an empty string like this:

$ curl http://localhost:6800/schedule.json -d project=tutorial -d spider=example -d setting=FEED_URI=

For other people who are looking into logging... Let's see an example. We do the usual:

$ scrapy startproject tutorial
$ cd tutorial
$ scrapy genspider example example.com

then edit tutorial/spiders/example.py to contain the following:

import scrapy

class TutorialItem(scrapy.Item):
    name = scrapy.Field()
    surname = scrapy.Field()

class ExampleSpider(scrapy.Spider):
    name = "example"

    start_urls = (
        'http://www.example.com/',
    )

    def parse(self, response):
        for i in xrange(100):
            t = TutorialItem()
            t['name'] = "foo"
            t['surname'] = "bar %d" % i
            yield t

Notice the difference between running:

$ scrapy crawl example
# or
$ scrapy crawl example -L DEBUG
# or
$ scrapy crawl example -s LOG_LEVEL=DEBUG

and

$ scrapy crawl example -s LOG_LEVEL=INFO
# or
$ scrapy crawl example -L INFO

By trying such combinations on your spider confirm that it doesn't print Item info for log-level beyond debug.

It's now time, after you deploy to scrapyd to do exactly the same:

$ curl http://localhost:6800/schedule.json -d setting=LOG_LEVEL=INFO -d project=tutorial -d spider=example

Confirm that the logs don't contain items when you run:

Note that if your items are still printed in INFO level, it likely means that your code or some pipeline is printing it. You could rise log-level further and/or investigate and find the code that prints it and remove it.