Search code examples
pythonscrapyrequestscrapy-request

Can Response not return an integer value in Scrapy?


In order to find out how many bills has each member of parliament has their signature, I'm trying to write a scraper on the members of parliament which works with 3 layers:

  1. Accessing the link for each MP from the list
  2. From (1) accessing the page with information including the bills the MP has a signature on
  3. From (3) accessing the page where the bill proposals with MP's signature is shown, count them, assign their number to ktsayisi variable (problem occurs here)

At the last layer, I'm trying to return the number of bills by counting by the relevant xss selector by means of len() function. But apparently I can't assign the returned number from (3) to a value to be eventually yielded.

Scrapy returns just the link accessed rather than the number that I want the function to return. Why is it so? Can't I write a statement like X = Request(url,callback = function) where the defined function used in Response can iterate an integer? How can I fix it?

I want a number to be in the place of these statements yielded : <GET https://www.tbmm.gov.tr/Milletvekilleri/KanunTeklifiUyeninImzasiBulunanTeklifler?donemKod=27&sicil=UqVZp9Fvweo=>

Thanks in advance.

What is yielded

'''

from scrapy import Spider

from scrapy.http import Request

class MvSpider(Spider):
    name = 'mv'
    allowed_domains = ['tbmm.gov.tr']  #website of the parliament
    start_urls = ['https://www.tbmm.gov.tr/Milletvekilleri/liste'] #the link which has the list of MPs


def parse(self, response):
    
    mv_linkler = response.xpath('//div[@class="col-md-8"]/a/@href').getall()

    for link in mv_linkler:
        mutlak_link = response.urljoin(link)  #absolute url


        yield Request(mutlak_link, callback = self.mv_analiz)

def mv_analiz(self, response): #function to analyze the MP

        kteklif_link_path = response.xpath("//a[contains(text(),'İmzası Bulunan Kanun Teklifleri')]/@href").get()
        kteklif_link = response.urljoin(kteklif_link_path)
        
        ktsayisi = int(Request(kteklif_link, callback = self.kt_say)) #the value of the number of bill proposals to be requested

def kt_say(self,response):

    kteklifler = response.xpath("//tr[@valign='TOP']")

    return len(kteklifler)

'''


Solution

  • You can't, furas's explanation pretty much covers why and I don't have anything to add, you need to do something like this:

    from scrapy import Spider
    from scrapy.http import Request
    
    
    class MvSpider(Spider):
        name = 'mv'
        allowed_domains = ['tbmm.gov.tr']  #website of the parliament
        start_urls = ['https://www.tbmm.gov.tr/Milletvekilleri/liste'] #the link which has the list of MPs
    
        def parse(self, response):
            mv_linkler = response.xpath('//div[@class="col-md-8"]/a/@href').getall()
    
            for link in mv_linkler:
                mutlak_link = response.urljoin(link)  #absolute url
    
                yield Request(mutlak_link, callback=self.mv_analiz)
    
        def mv_analiz(self, response): #function to analyze the MP
            kteklif_link_path = response.xpath("//a[contains(text(),'İmzası Bulunan Kanun Teklifleri')]/@href").get()
            kteklif_link = response.urljoin(kteklif_link_path)
            item = {}
            req = Request(kteklif_link, callback=self.kt_say) #the value of the number of bill proposals to be requested
            req.meta['item'] = item
            yield req
    
        def kt_say(self, response):
            kteklifler = response.xpath("//tr[@valign='TOP']")
            item = response.meta['item']
            item['ktsayisi'] = len(kteklifler)
            yield item