I'm running a basic scrapy crawler and I can't seem to find any documentation within scrapy that allows me to change the delimiter of a .getall()
. The default appears to be comma separated, but I'm assuming this might cause some errors in data importing elsewhere.
Ideally, I want the exported csv to be comma separated, but the getall() data is pipe or semi-colon separated. I would prefer to fix this efficiently within the scrapy script. For example, say the bit containing the .getall() is
def entry_parse(self, response):
for entry in response.xpath("//tbody[@class='entry-grid-body infinite']//td[@class]"):
yield {'entry_labels': entry.xpath(".//div[@class='entry-labels']/span/text()").getall()}
Ideally, it would be nice to be able pass such an argument into getall() or something similar, but I can't seem to find any documentation allowing that. Any ideas would be helpful! Thanks.
This is not really a problem of scrapy. Since the .getall()
method returns a list and the repr of lists have commas by default
>>>repr(["a","b"])
"['a', 'b']"
you can use json.dumps and change the delimiter before yielding the item using the separators
argument
import json
def entry_parse(self, response):
for entry in response.xpath("//tbody[@class='entry-grid-body infinite']//td[@class]"):
yield {
'entry_labels': json.dumps(
entry.xpath(".//div[@class='entry-labels']/span/text()").getall()
, separators=("|", ":")
)
}