Search code examples
pythonscrapyscrapy-pipeline

Python Scrapy Pipeline Edit Last Item?


I'm using a pipeline in Scrapy to output the scraped results into a JSON file. The pipeline places a comma after each item that is scraped however, I want to drop the comma for the last item. Is there a way to do that?

This is the pipeline:

class ExamplePipeline(object):
def open_spider(self, spider):
    self.file = open('example.json', 'w')
    self.file.write("[")

def close_spider(self, spider):
    self.file.write("]")
    self.file.close()

def process_item(self, item, spider):
    line = json.dumps(
        dict(item),
        indent = 4,
        sort_keys = True,
        separators = (',', ': ')
    ) + ",\n"
    self.file.write(line)
    return item

And the sample output looks like:

[
{
    "item1": "example",
    "item2": "example"
},
{
    "item1": "example",
    "item2": "example"
},
]

What is the python method to find the last item and not give it a comma separator? I thought I could do something like if item[-1] ... but I can't get that working.

Any ideas?


Solution

  • To apply this to your pipeline, you'll have to seek back in your file and delete that comma:

    See related Python - Remove very last character in file

    class ExamplePipeline(object):
    
        def close_spider(self, spider):
            # go back 2 characters: \n and ,
            self.file.seek(-2, os.SEEK_END)
            # cut trailing data
            self.file.truncate()
            # save
            self.file.write("]")
            self.file.close()