I'm trying to write a parser for m-ati.su by using of scrapy. At the first step I have to get values and textfields from comboboxes with names "From" and "To" for different cities. I looked request at firebug and wrote
class spider(BaseSpider):
name = 'ati_su'
start_urls = ['http://m-ati.su/Tables/Default.aspx?EntityType=Load']
allowed_domains = ["m-ati.su"]
def parse(self, response):
yield FormRequest('http://m-ati.su/Services/ATIGeoService.asmx/GetGeoCompletionList',
callback=self.ati_from,
formdata={'prefixText': 'moscow', 'count': '10','contextKey':'All_0$Rus'})
def ati_from(self, response):
json = response.body
open('results.txt', 'wb').write(json)
And I have "500 Internal Server Error" for this request. What did I do wrong? Sorry for bad english. Thanks
I think you may have to add a X-Requested-With: XMLHttpRequest
header to your POST request, so you can try this:
def parse(self, response):
yield FormRequest('http://m-ati.su/Services/ATIGeoService.asmx/GetGeoCompletionList',
callback=self.ati_from,
formdata={'prefixText': 'moscow', 'count': '10','contextKey':'All_0$Rus'},
headers={"X-Requested-With": "XMLHttpRequest"})
Edit: I tried running the spider and came with this:
(the request body is JSON encoded when I inspect it with Firefox so I used Request
and forcing "POST" method, and the response I got was endoded in "windows-1251")
from scrapy.spider import BaseSpider
from scrapy.http import Request
import json
class spider(BaseSpider):
name = 'ati_su'
start_urls = ['http://m-ati.su/Tables/Default.aspx?EntityType=Load']
allowed_domains = ["m-ati.su"]
def parse(self, response):
yield Request('http://m-ati.su/Services/ATIGeoService.asmx/GetGeoCompletionList',
callback=self.ati_from,
method="POST",
body=json.dumps({
'prefixText': 'moscow',
'count': '10',
'contextKey':'All_0$Rus'
}),
headers={
"X-Requested-With": "XMLHttpRequest",
"Accept": "application/json, text/javascript, */*; q=0.01",
"Content-Type": "application/json; charset=utf-8",
"Pragma": "no-cache",
"Cache-Control": "no-cache",
})
def ati_from(self, response):
jsondata = response.body
print json.loads(jsondata, encoding="windows-1251")