I'm using a pipeline in Scrapy to output the scraped results into a JSON file. The pipeline places a comma after each item that is scraped however, I want to drop the comma for the last item. Is there a way to do that?
This is the pipeline:
class ExamplePipeline(object):
def open_spider(self, spider):
self.file = open('example.json', 'w')
self.file.write("[")
def close_spider(self, spider):
self.file.write("]")
self.file.close()
def process_item(self, item, spider):
line = json.dumps(
dict(item),
indent = 4,
sort_keys = True,
separators = (',', ': ')
) + ",\n"
self.file.write(line)
return item
And the sample output looks like:
[
{
"item1": "example",
"item2": "example"
},
{
"item1": "example",
"item2": "example"
},
]
What is the python method to find the last item and not give it a comma separator? I thought I could do something like if item[-1] ...
but I can't get that working.
Any ideas?
To apply this to your pipeline, you'll have to seek back in your file and delete that comma:
See related Python - Remove very last character in file
class ExamplePipeline(object):
def close_spider(self, spider):
# go back 2 characters: \n and ,
self.file.seek(-2, os.SEEK_END)
# cut trailing data
self.file.truncate()
# save
self.file.write("]")
self.file.close()