Search code examples
scrapyscrapy-pipeline

Scrapy: how to send the items to the site via the api


Now my spiders are sending data to my site in this way:

def parse_product(response, **cb_kwargs):
    item = {}
    item[url] = response.url
    data = {
        "source_id": 505,
        "token": f"{API_TOKEN}",
        "products": [item]
         }
    headers = {'Content-Type': 'application/json'}
    url = 'http://some.site.com/api/'
    requests.post(url=url, headers=headers, data=json.dumps(data))

is it possible to somehow implement this design through a pipeline or middleware, because it is inconvenient to prescribe for each spider?

p.s. the data (data) needs to be sent in the json format (json.dumps(data)), if I make the item = MyItemClass() class, an error occurs...


Solution

  • It can be done using a pipeline fairly easily. You can also use scrapy's Item class and item Field class as long as you cast them to a dict prior to calling json.dumps.

    For Example:

    class Pipeline:
    
        def process_item(self, item, spider):
            data = dict(item)
            headers = {'Content-Type': 'application/json'}
            url = 'http://some.site.com/api/'
            requests.post(url=url, headers=headers, data=json.dumps(data))
            return item
    

    If you use this example it will call it on each and every item you yield from your spider. Just remember to activate it in your settings.py file.