My intension is to achieve the pagination from javascript functions
, so for example I am taking the URL as http://events.justdial.com/events/index.php?city=Hyderabad
, from this URL as you can see the pagination at the end of the page, so if you observe HTML of that they are written through JavaScript functions which has href
tags as #
, I am just trying to collect that href tags even though they are #
.
The following is my code
class justdialdotcomSpider(BaseSpider):
name = "justdialdotcom"
allowed_domains = ["www.justdial.com"]
start_urls = ["http://events.justdial.com/events/index.php?city=Hyderabad"]
def parse(self, response):
hxs = HtmlXPathSelector(response)
pagination = hxs.select('//div[@id="main"]/div[@id="content"]/div[@id="pagination"]/a').extract()
print pagination,">>>>>>>>>>>>>>>>>."
When I run the above code I am getting the result as []
, I mean none,can anyone tell me how to achieve the pagination through that JavaScript onclick functions and why the result is empty.And I am observing some kind of wierd in HTML that for example one of the page in pagination has anchor tag as <a onclick="jdevents.setPageNo(2)" href="#">2</a>
but when I tried to view this by clicking view page source
through browser I can't see any function as jdevents.setPageNo(2)
, (I expect if we can see what he is doing in HTML we can post that through formdata as request) I am really confused and unable to go through this.
If you tracked the requests, you'll find post requests to the following URL : http://events.justdial.com/events/search.php
Post Data :
city:Hyderabad
cat:0
area:0
fromDate:
toDate:
subCat:0
pageNo:2
fetch:events
and the response is in JSON format.
So, your code should be the following
import re
import json
class justdialdotcomSpider(BaseSpider):
name = "justdialdotcom"
domain_name = "www.justdial.com"
start_urls = ["http://events.justdial.com/events/search.php"]
# Initial request
def parse(self, response):
return [FormRequest(url="http://events.justdial.com/events/search.php",
formdata={'fetch': 'area',
'pageNo': '1',
'city' : 'Hyderabad',
'cat' : '0',
'area' : '0',
'fromDate': '',
'toDate' : '',
'subCat' : '0'
},
callback=self.area_count
)]
# Get total count and paginate through events
def area_count(self, response):
total_count = 0
for area in json.loads(response.body):
total_count += int(area["count"])
pages_count = (total_count / 10) + 1
page = 1
while (page <= pages_count):
yield FormRequest(url="http://events.justdial.com/events/search.php",
formdata={'fetch': 'events',
'pageNo': str(page),
'city' : 'Hyderabad',
'cat' : '0',
'area' : '0',
'fromDate': '',
'toDate' : '',
'subCat' : '0'
},
callback=self.parse_events
)
page += 1
# parse events
def parse_events(self, response):
events = json.loads(response.body)
events.pop(0)
for event_details in events:
yield FormRequest(url="http://events.justdial.com/events/search.php",
formdata={'fetch': 'event',
'eventId': str(event_details["id"]),
},
callback=self.parse_event
)
def parse_event(self, response):
event_details = json.loads(response.body)
items = []
#item = Product()
items.append(item)
return items