Search code examples
pythonscrapyajax-request

How to get value and textfield from combobox with ajax?


I'm trying to write a parser for m-ati.su by using of scrapy. At the first step I have to get values and textfields from comboboxes with names "From" and "To" for different cities. I looked request at firebug and wrote

class spider(BaseSpider):
    name = 'ati_su'
    start_urls = ['http://m-ati.su/Tables/Default.aspx?EntityType=Load']
    allowed_domains = ["m-ati.su"]

    def parse(self, response):
        yield FormRequest('http://m-ati.su/Services/ATIGeoService.asmx/GetGeoCompletionList', 
                        callback=self.ati_from, 
                        formdata={'prefixText': 'moscow', 'count': '10','contextKey':'All_0$Rus'})
    def ati_from(self, response):
        json = response.body
        open('results.txt', 'wb').write(json)

And I have "500 Internal Server Error" for this request. What did I do wrong? Sorry for bad english. Thanks


Solution

  • I think you may have to add a X-Requested-With: XMLHttpRequest header to your POST request, so you can try this:

        def parse(self, response):
            yield FormRequest('http://m-ati.su/Services/ATIGeoService.asmx/GetGeoCompletionList', 
                              callback=self.ati_from, 
                              formdata={'prefixText': 'moscow', 'count': '10','contextKey':'All_0$Rus'},
                              headers={"X-Requested-With": "XMLHttpRequest"})
    

    Edit: I tried running the spider and came with this:

    (the request body is JSON encoded when I inspect it with Firefox so I used Request and forcing "POST" method, and the response I got was endoded in "windows-1251")

    from scrapy.spider import BaseSpider
    from scrapy.http import Request
    import json
    
    class spider(BaseSpider):
        name = 'ati_su'
        start_urls = ['http://m-ati.su/Tables/Default.aspx?EntityType=Load']
        allowed_domains = ["m-ati.su"]
    
        def parse(self, response):
            yield Request('http://m-ati.su/Services/ATIGeoService.asmx/GetGeoCompletionList',
                          callback=self.ati_from,
                          method="POST",
                          body=json.dumps({
                                'prefixText': 'moscow',
                                'count': '10',
                                'contextKey':'All_0$Rus'
                          }),
                          headers={
                                "X-Requested-With": "XMLHttpRequest",
                                "Accept": "application/json, text/javascript, */*; q=0.01",
                                "Content-Type": "application/json; charset=utf-8",
                                "Pragma": "no-cache",
                                "Cache-Control": "no-cache",
                          })
        def ati_from(self, response):
            jsondata = response.body
            print json.loads(jsondata, encoding="windows-1251")