Search code examples
pythonweb-scrapingscrapyfancybox

How can I scrape the text from this popup window? [Python and Scrapy]


Please note - I'm very unexperienced and this is my first 'real' project.

I'm going to try to explain my problem as best as I can, apologies if some of the terms are incorrect.

I'm trying to scrape the following webpage - https://www.eaab.org.za/agent_agency_search?type=Agents&search_agent=+&submit_agent_search=GO

I can scrape the 'Name' and 'Status', but I also need to get some of the information in the 'Full Details' popup window.

I have noticed that when clicking on the 'Full Details' button the URL stays the same.

Below is what my code looks like:

import scrapy
from FirstScrape.items import FirstscrapeItem

class FirstSpider(scrapy.Spider):
    name = "spiderman"
    start_urls = [
        
        "https://www.eaab.org.za/agent_agency_search?type=Agents&search_agent=+&submit_agent_search=GO"
        
        ]
    
    def parse(self, response):
        item = FirstscrapeItem()
        item['name'] = response.xpath("//tr[@class='even']/td[1]/text()").get()
        item['status'] = response.xpath("//tr[@class='even']/td[2]/text()").get()
        #first refers to firstname in the popup window
        item['first'] = response.xpath("//div[@class='result-list default']/tbody/tr[2]/td[2]/text()").get()
        
        
        return item

I launch my code from the terminal and export it to a .csv file.

Not sure if this will help but this is the popup / fancy box window:

popup window

Do I need to use Selenium to click on the button or am I just missing something? Any help will be appreciated.

I'm very eager to learn more about Python and scraping.

Thank you.


Solution

  • In the Full Detail you have the href attribute you need to get this url and make requests. Maybe it helps you:

    import scrapy
    from scrapy.crawler import CrawlerProcess
    
    class FirstSpider(scrapy.Spider):
        name = "spiderman"
        start_urls = [
            
            "https://www.eaab.org.za/agent_agency_search?type=Agents&search_agent=+&submit_agent_search=GO"
            
            ]
        
        def parse(self, response):
                    
            all_urls = [i.attrib["href"] for i in response.css(".agent-detail")]
            for url in all_urls:
                yield scrapy.Request(url=f"https://www.eaab.org.za{url}", callback=self.parse_data)
            
        def parse_data(self, response):
            print(response.css("td::text").extract())
            print("-----------------------------------")