Search code examples
pythonseleniumscrapydriverresponse

Making changes with Selenium in a web page and the response of the driver returns the same value (Scrapy, Python)


In a web page that has the Show More button, I click it in a loop till it isnt around anymore (I can see the entire page). Now I need to get some data but the data that I'm getting is the same as before clicking the Show More button.

This is the code that does this:

    bodyBefore = response.xpath('/body').get()

    # Click the Show More button till it isn't anymore
    showmore_btn = self.driver.find_elements_by_xpath(
        "//a[@class='event__more event__more--static']")

    while len(showmore_btn) > 0:
        showmore_btn[0].send_keys(Keys.ENTER)
        # Add more time if the previous command doens`t work (Bad internet connection)
        time.sleep(5)
        showmore_btn = self.driver.find_elements_by_xpath(
            "//a[@class='event__more event__more--static']")

    bodyAfter = response.xpath('/body').get()

I can't get the new html code in order to scrape it. (With bodyBefore and bodyAfter I can prove this easily)

Does someone know how to do this?

The url I'm scraping is: https://www.flashscore.com/football/england/premier-league-2018-2019/results/

In this case I want to scrape each match url that appears after clicking on Show More


Solution

  • First you need to find main table then all <div> tags that contains rows of data. Next you can loop over elements in row to get text data. I added progress string to loop, hope you enjoy it :)

    from selenium import webdriver
    from selenium.webdriver.common.keys import Keys
    import os
    import time
    import sys
    
    
    driver = webdriver.Chrome(executable_path =os.path.abspath(os.getcwd()) + "/chromedriver")
    driver.get("https://www.flashscore.com/football/england/premier-league-2018-2019/results/")
    
    # extend table
    show_more_buttons = driver.find_elements_by_xpath("//a[@class='event__more event__more--static']")
    while len(show_more_buttons) > 0:
        show_more_buttons[0].send_keys(Keys.ENTER)
        time.sleep(2)
        show_more_buttons = driver.find_elements_by_xpath("//a[@class='event__more event__more--static']")
    
    # get table and events
    table = driver.find_element_by_xpath('//*[@id="live-table"]/div[1]/div/div')
    events = table.find_elements_by_class_name('event__match.event__match--static.event__match--oneLine')
    
    # loop over events and collect data
    count = 1
    data = []
    for item in events:
        time = item.find_element_by_class_name('event__time').text
        participant_home = item.find_element_by_class_name('event__participant.event__participant--home').text
        event_scores = item.find_element_by_class_name('event__scores.fontBold').text
        participant_away = item.find_element_by_class_name('event__participant.event__participant--away').text
        event_part = item.find_element_by_class_name('event__part').text
        data.append([time, participant_home, event_scores.replace('\n', ''), participant_away, event_part])
        sys.stdout.write('\r')
        sys.stdout.write("progress: %.2f %%" % ((count/len(events))*100))
        sys.stdout.flush()
        count += 1
    
    for item in data:
        print(item)
    

    Output:

    ['12.05. 16:00', 'Brighton', '1 - 4', 'Manchester City', '(1 - 2)']
    ['12.05. 16:00', 'Burnley', '1 - 3', 'Arsenal', '(0 - 0)']
    ..
    ..
    ..
    ['11.08. 16:00', 'Watford', '2 - 0', 'Brighton', '(1 - 0)']
    ['11.08. 13:30', 'Newcastle', '1 - 2', 'Tottenham', '(1 - 2)']
    ['10.08. 21:00', 'Manchester Utd', '2 - 1', 'Leicester', '(1 - 0)']