I try to scrape some data from amazon and I need to sort the books by the number of reviews on this page:www.amazon.com/s/ref=lp_283155_nr_n_0?fst=as%3Aoff&rh=n%3A283155%2Cn%3A!1000%2Cn%3A1&bbn=1000&ie=UTF8&qid=1457964444&rnid=1000 If I parse this page with scrapy framework, somehow the form tag disappears so I cant scrape it, why is that??
my browser sees it like this: [1]: https://i.sstatic.net/sSrsK.jpg
scrapy framework sees it like this: [2]: https://i.sstatic.net/vUz2P.jpg
this is what I see when I open the page with scrapy's open_in_browser() method
it's weird and I have no clue what's wrong I appreciate your help
I tried replicating your error and found that the scrapy shell redirects to another link when I opened the give url. When I viewed the response it was a completely different page than what is mentioned in the question with no form
tag.
This was the Debug
code printed by scrapy :
2016-03-15 13:35:35 [scrapy] DEBUG: Redirecting (301) to <GET http://www.amazon.com/s?ie=UTF8&bbn=1000&page=1&rh=n%3A283155> from <GET http://www.amazon.com/s/ref=lp_283155_nr_n_0?fst=as%3Aoff&rh=n%3A283155%2Cn%3A2Cn%3A1&bbn=1000&ie=UTF8&qid=1457964444&rnid=1000>
The solution to this is to try opening the url using a user-agent
. Something like this:
scrapy shell -s USER_AGENT='Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.7 (KHTML, like Gecko) Chrome/16.0.912.36 Safari/535.7' "http://www.amazon.com/s/ref=lp_283155_nr_n_0?fst=as%3Aoff&rh=n%3A283155%2Cn%3A2Cn%3A1&bbn=1000&ie=UTF8&qid=1457964444&rnid=1000"