Below is my code that throws 400 error in Scrapy log. My logic behind this code is as follows - 1) I use post request to get my Secret_Token. 2) I set my header to use secret token and define parameters for API search string. Also I believe header with Secret_token should be passed as meta for further requests. 3) Here i expect Parse function to recieve json response from Request in #2 and parse it into items. After that loop inside Parse method with a list of parameters for ready and working Request #2.
The problem is that it does not work) Log attached. I wonder if i pass parameters and secret token correctly and how can i pass secret token in meta?
import scrapy
import json
import requests
import pprint
class YelpSpider(scrapy.Spider):
name = "yelp"
allowed_domains = ["https://api.yelp.com"]
def start_requests(self):
params = {
'grant_type': 'client_credentials',
'client_id': '*******',
'client_secret' : '*******'
}
request = requests.post('https://api.yelp.com/oauth2/token', params = params)
bearer_token = request.json()['access_token']
headers = {'Authorization' : 'Bearer %s' % bearer_token}
params = {
'term': 'restaurant',
'offset': 20,
'cc' : 'AU',
'location': 4806
}
yield scrapy.Request('https://api.yelp.com/v3/businesses/search', headers = headers, cookies = params, callback= self.parse)
def parse(self, response):
item = response.json()['businesses']
return item
Below is fully functional code for Yelp Fusion API with Scrapy. I am yet to implement url generation logic based on postal code and offset parameter to show up to 1000 entries. Plus implement items. Please post your comments in case you have some recomendations on how to improve the code.
P.S. By the way, Fusion API has increased limit of showing results to 50. So now you can use 'limit' : 50, 'offset': 50,
# -*- coding: utf-8 -*-
import scrapy
import json
import urllib
class YelpSpider(scrapy.Spider):
name = "yelp"
def start_requests(self):
# as per Yelp docs we pass personal info as POST to get access_token
# here a pass it to different function as do not know how to to all in one
params = {
'grant_type': 'client_credentials',
'client_id': '**********',
'client_secret' : '************'
}
yield scrapy.Request(url='https://api.yelp.com/oauth2/token', method="POST", body=urllib.urlencode(params))
def parse(self, response):
# revoke access token from response object. and set Header according to Yelp docs.
bearer_token = json.loads(response.body)['access_token']
headers = {'Authorization' : 'Bearer %s' % bearer_token}
# set search parameters
params = {
'term': 'restaurant',
'offset': 20,
'cc' : 'AU',
'location': 4806
}
# base search URL for Fusion API
url = "https://api.yelp.com/v3/businesses/search"
# form Get request to recieve final info as JSON. Unfortunately I did not find appropriate
# method to pass params in Scrapy other then shown below.
yield scrapy.Request(url= url + '?' + urllib.urlencode(params), method="GET", headers=headers, callback=self.parse_items)
def parse_items(self, response):
# parse needed items.
resp = json.loads(response.body)['businesses']
print resp