Search code examples
pythonapiscrapyyelp

How to authenticate Yelp API in scrapy? Pass Secret_Token and Search params?


Below is my code that throws 400 error in Scrapy log. My logic behind this code is as follows - 1) I use post request to get my Secret_Token. 2) I set my header to use secret token and define parameters for API search string. Also I believe header with Secret_token should be passed as meta for further requests. 3) Here i expect Parse function to recieve json response from Request in #2 and parse it into items. After that loop inside Parse method with a list of parameters for ready and working Request #2.

The problem is that it does not work) Log attached. I wonder if i pass parameters and secret token correctly and how can i pass secret token in meta? enter image description here

import scrapy
import json
import requests
import pprint



class YelpSpider(scrapy.Spider):
    name = "yelp"
    allowed_domains = ["https://api.yelp.com"]

    def start_requests(self):
        params = {
            'grant_type': 'client_credentials',
            'client_id': '*******',
            'client_secret' : '*******'
        }  

        request = requests.post('https://api.yelp.com/oauth2/token', params = params)



        bearer_token = request.json()['access_token']
        headers = {'Authorization' : 'Bearer %s' % bearer_token}

        params = {
                    'term': 'restaurant',
                    'offset': 20,
                    'cc' : 'AU',
                    'location': 4806
                }

        yield scrapy.Request('https://api.yelp.com/v3/businesses/search', headers = headers, cookies = params, callback= self.parse)







    def parse(self, response):
        item = response.json()['businesses']
        return item

Solution

  • Below is fully functional code for Yelp Fusion API with Scrapy. I am yet to implement url generation logic based on postal code and offset parameter to show up to 1000 entries. Plus implement items. Please post your comments in case you have some recomendations on how to improve the code.

    P.S. By the way, Fusion API has increased limit of showing results to 50. So now you can use 'limit' : 50, 'offset': 50,

    # -*- coding: utf-8 -*-
    import scrapy
    import json
    
    import urllib
    
    class YelpSpider(scrapy.Spider):
    
    
     name = "yelp"
    
    
        def start_requests(self):
    
            # as per Yelp docs we pass personal info as POST to get access_token
            # here a pass it to different function as do not know how to to all in one
            params = {
                'grant_type': 'client_credentials',
                'client_id': '**********',
                'client_secret' : '************'
            }  
    
            yield scrapy.Request(url='https://api.yelp.com/oauth2/token',  method="POST", body=urllib.urlencode(params))
    
    
    
    
        def parse(self, response):
    
            # revoke access token from response object. and set Header according to Yelp docs.
            bearer_token = json.loads(response.body)['access_token']
            headers = {'Authorization' : 'Bearer %s' % bearer_token}
    
            # set search parameters
            params = {
                        'term': 'restaurant',
                        'offset': 20,
                        'cc' : 'AU',
                        'location': 4806
                    }
            # base search URL for Fusion API
            url = "https://api.yelp.com/v3/businesses/search"
    
            # form Get request to recieve final info as JSON. Unfortunately I did not find appropriate 
            # method to pass params in Scrapy other then shown below.
    
            yield scrapy.Request(url= url + '?' + urllib.urlencode(params), method="GET", headers=headers, callback=self.parse_items)
    
    
    
    
        def parse_items(self, response):
    
            # parse needed items.
    
            resp = json.loads(response.body)['businesses']
            print resp