I want to scrape data from Netflix to get the following information: 1.Show Name 2.Season 3.Episode Name for each of the season 4.URL for each of the episode 5.Duration for each of the show. Selenium version: 3.141.0 Python version: 3.6.6, using Chrome Webdriver.
The tool is able to login in, search for the specific show and click on Episodes tab as shown below: Screenshot of sample episodes for a show
HTML elements Details for each episode are as follows:
div class="slider-item slider-item-0"
div class="slider-item slider-item-1"
div class="slider-item slider-item-2"
div class="slider-item slider-item-3"
div class="slider-item slider-item-"
div class="slider-item slider-item-"
div class="slider-item slider-item-"
div class="slider-item slider-item-"
After the 4th item, other elements are hidden from the home screen
To locate the elements I am using the below code driver.find_elements_by_xpath("//div[@class='episodeTitle']//p[@class ='ellipsized']"). This gives the list of episode name for each show.
I know for locating the hidden elements we can use print demo_div.get_attribute('innerHTML')
driver.execute_script("return arguments[0].innerHTML", demo_div)
print demo_div.get_attribute('textContent')
driver.execute_script("return arguments[0].textContent", demo_div)
from the below link:
https://yizeng.me/2014/04/08/get-text-from-hidden-elements-using-selenium-webdriver/
THE ISSUE: Every time the details of last two or more episodes are missed.
I have used both of the above techniques to get the hidden elements, but no luck
Also I have use Web driver implicit and explicit wait times but still the some episodes go missing.
Code Snippet to get the episode name:
e8= driver.find_elements_by_xpath("//div[@class='episodeTitle']//p[@class ='ellipsized']")
Appreciate the help.
These div are loading dynamically on the arrow click.