Now my spiders are sending data to my site in this way:
def parse_product(response, **cb_kwargs):
item = {}
item[url] = response.url
data = {
"source_id": 505,
"token": f"{API_TOKEN}",
"products": [item]
}
headers = {'Content-Type': 'application/json'}
url = 'http://some.site.com/api/'
requests.post(url=url, headers=headers, data=json.dumps(data))
is it possible to somehow implement this design through a pipeline or middleware, because it is inconvenient to prescribe for each spider?
p.s. the data (data) needs to be sent in the json format (json.dumps(data)
), if I make the item = MyItemClass(
) class, an error occurs...
It can be done using a pipeline fairly easily. You can also use scrapy's Item
class and item Field
class as long as you cast them to a dict
prior to calling json.dumps
.
For Example:
class Pipeline:
def process_item(self, item, spider):
data = dict(item)
headers = {'Content-Type': 'application/json'}
url = 'http://some.site.com/api/'
requests.post(url=url, headers=headers, data=json.dumps(data))
return item
If you use this example it will call it on each and every item you yield from your spider. Just remember to activate it in your settings.py file.