Search code examples
pythonhtmlseleniumchrome-web-driver

Python Selenium Webdriver: Unable to fetch data for all hidden elements using "textContent"


I want to scrape data from Netflix to get the following information: 1.Show Name 2.Season 3.Episode Name for each of the season 4.URL for each of the episode 5.Duration for each of the show. Selenium version: 3.141.0 Python version: 3.6.6, using Chrome Webdriver.

The tool is able to login in, search for the specific show and click on Episodes tab as shown below: Screenshot of sample episodes for a show

HTML elements Details for each episode are as follows:

div class="slider-item slider-item-0" 
div class="slider-item slider-item-1"
div class="slider-item slider-item-2"
div class="slider-item slider-item-3"
div class="slider-item slider-item-"
div class="slider-item slider-item-"
div class="slider-item slider-item-"
div class="slider-item slider-item-"

After the 4th item, other elements are hidden from the home screen

To locate the elements I am using the below code driver.find_elements_by_xpath("//div[@class='episodeTitle']//p[@class ='ellipsized']"). This gives the list of episode name for each show.

I know for locating the hidden elements we can use print demo_div.get_attribute('innerHTML') driver.execute_script("return arguments[0].innerHTML", demo_div)

print demo_div.get_attribute('textContent') driver.execute_script("return arguments[0].textContent", demo_div) from the below link: https://yizeng.me/2014/04/08/get-text-from-hidden-elements-using-selenium-webdriver/ THE ISSUE: Every time the details of last two or more episodes are missed. I have used both of the above techniques to get the hidden elements, but no luck Also I have use Web driver implicit and explicit wait times but still the some episodes go missing. Code Snippet to get the episode name:

e8= driver.find_elements_by_xpath("//div[@class='episodeTitle']//p[@class ='ellipsized']")

Appreciate the help.


Solution

  • These div are loading dynamically on the arrow click.

    1. You should extract the visible items list
    2. Then Click the arrow button and wait for list replacement or load
    3. Extract the new list