Search code examples
web-scrapingdynamicpython-requestsscrapyxmlhttprequest

Scrapy request not going through


I don't know how to frame this question exactly. I am beginner at web scraping and I am trying to crawl a website using Python Scrapy. The website is dynamic and uses javascript and can't retrieve any data using the basic level xpath and CSS selectors.

I am trying to mimic the API request through my spider by requesting the url which has the data in json object. That request url is throwing a HTTP status code is not handled or not allowed error. I think I am calling the wrong URL. 9/10 times this method of calling the json object url directly has worked for me. What can I do different? the url has parameters and form data items in the headers section and the url doesn't even look like a valid website url it starts with https://ih3kc909gb-dsn.algolia.net/1/indexes.... I know this is a long question but I could really use some help with how to get a response for this?


Solution

  • You should use start_requests() method instead of start_urls property. You can read more about it from here . Now, all you need to do is to make a POST request.

    Code

    import scrapy
    
    class carswitch(scrapy.Spider):
        name = 'car'
    
        headers = {
            "Connection": "keep-alive",
            "Pragma": "no-cache",
            "Cache-Control": "no-cache",
            "sec-ch-ua": "\" Not;A Brand\";v=\"99\", \"Google Chrome\";v=\"91\", \"Chromium\";v=\"91\"",
            "accept": "application/json",
            "sec-ch-ua-mobile": "?0",
            "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.114 Safari/537.36",
            "content-type": "application/x-www-form-urlencoded",
            "Origin": "https://carswitch.com",
            "Sec-Fetch-Site": "cross-site",
            "Sec-Fetch-Mode": "cors",
            "Sec-Fetch-Dest": "empty",
            "Referer": "https://carswitch.com/",
            "Accept-Language": "en-US,en;q=0.9"
        }
    
        body = '{"params":"query=&hitsPerPage=24&page=0&numericFilters=%5B%22country_id%3D1%22%2C%22used_car%20%3D%201%22%5D&facetFilters=&typoTolerance=&tagFilters=%5B%5D&attributesToHighlight=%5B%5D&attributesToRetrieve=%5B%22make%22%2C%22make_ar%22%2C%22model%22%2C%22model_ar%22%2C%22year%22%2C%22trim%22%2C%22displayTrim%22%2C%22colorPaint%22%2C%22bodyType%22%2C%22salePrice%22%2C%22transmissionType%22%2C%22GPS%22%2C%22carID%22%2C%22inspectionID%22%2C%22inspectionStatus%22%2C%22rate%22%2C%22certified_dealer_id%22%2C%22dealer_category%22%2C%22used_car%22%2C%22new%22%2C%22top_condition%22%2C%22featured%22%2C%22photo%22%2C%22modifiedPlace%22%2C%22city%22%2C%22mileage%22%2C%22urgent_sales%22%2C%22price_dropped%22%2C%22urgent_sales_days%22%2C%22urgent_sales_end_date%22%2C%22date%22%2C%22negotiable%22%2C%22oldPrice%22%2C%22zero_downpayment%22%2C%22cashOnly%22%2C%22hasPriceGuidance%22%2C%22dealerOffer%22%2C%22maxPrice%22%2C%22fairPrice%22%2C%22pricey_deal%22%2C%22fair_deal%22%2C%22good_deal%22%2C%22great_deal%22%2C%22dealership_info%22%2C%22logo_small%22%2C%22GCCspecs%22%2C%22country%22%2C%22export%22%2C%22monthly_price%22%5D"}'
    
        def start_requests(self):
            url = 'https://ih3kc909gb-dsn.algolia.net/1/indexes/All_Carswitch_Cars/query?x-algolia-agent=Algolia%20for%20JavaScript%20(3.33.0)%3B%20Browser&x-algolia-application-id=IH3KC909GB&x-algolia-api-key=493a9bbc57331df3b278fa39c1dd8f2d'    
    
            yield Request(url=url, method='POST', headers=self.headers, body=self.body, callback=self.parse)
    
    
        def parse(self,response):
    
            print(response.body)