In order to find out how many bills has each member of parliament has their signature, I'm trying to write a scraper on the members of parliament which works with 3 layers:
At the last layer, I'm trying to return the number of bills by counting by the relevant xss selector by means of len() function. But apparently I can't assign the returned number from (3) to a value to be eventually yielded.
Scrapy returns just the link accessed rather than the number that I want the function to return. Why is it so? Can't I write a statement like X = Request(url,callback = function) where the defined function used in Response can iterate an integer? How can I fix it?
I want a number to be in the place of these statements yielded : <GET https://www.tbmm.gov.tr/Milletvekilleri/KanunTeklifiUyeninImzasiBulunanTeklifler?donemKod=27&sicil=UqVZp9Fvweo=>
Thanks in advance.
'''
from scrapy import Spider
from scrapy.http import Request
class MvSpider(Spider):
name = 'mv'
allowed_domains = ['tbmm.gov.tr'] #website of the parliament
start_urls = ['https://www.tbmm.gov.tr/Milletvekilleri/liste'] #the link which has the list of MPs
def parse(self, response):
mv_linkler = response.xpath('//div[@class="col-md-8"]/a/@href').getall()
for link in mv_linkler:
mutlak_link = response.urljoin(link) #absolute url
yield Request(mutlak_link, callback = self.mv_analiz)
def mv_analiz(self, response): #function to analyze the MP
kteklif_link_path = response.xpath("//a[contains(text(),'İmzası Bulunan Kanun Teklifleri')]/@href").get()
kteklif_link = response.urljoin(kteklif_link_path)
ktsayisi = int(Request(kteklif_link, callback = self.kt_say)) #the value of the number of bill proposals to be requested
def kt_say(self,response):
kteklifler = response.xpath("//tr[@valign='TOP']")
return len(kteklifler)
'''
You can't, furas's explanation pretty much covers why and I don't have anything to add, you need to do something like this:
from scrapy import Spider
from scrapy.http import Request
class MvSpider(Spider):
name = 'mv'
allowed_domains = ['tbmm.gov.tr'] #website of the parliament
start_urls = ['https://www.tbmm.gov.tr/Milletvekilleri/liste'] #the link which has the list of MPs
def parse(self, response):
mv_linkler = response.xpath('//div[@class="col-md-8"]/a/@href').getall()
for link in mv_linkler:
mutlak_link = response.urljoin(link) #absolute url
yield Request(mutlak_link, callback=self.mv_analiz)
def mv_analiz(self, response): #function to analyze the MP
kteklif_link_path = response.xpath("//a[contains(text(),'İmzası Bulunan Kanun Teklifleri')]/@href").get()
kteklif_link = response.urljoin(kteklif_link_path)
item = {}
req = Request(kteklif_link, callback=self.kt_say) #the value of the number of bill proposals to be requested
req.meta['item'] = item
yield req
def kt_say(self, response):
kteklifler = response.xpath("//tr[@valign='TOP']")
item = response.meta['item']
item['ktsayisi'] = len(kteklifler)
yield item