Search code examples
pythonselenium-webdriverweb-scrapingfirefoxwebdriverwait

WebDriverWait is getting the title, but javascript, or something is changing the browser's title after the page has loaded


I'm using selenium to rename and sort the medium to a folder based on the title of the page, but the page is still loading content in the background and the title of the page changes after firefox has finished downloading and displaying the content.

Whenever I click on an episode of a docuseries that isn't the first episode in the series is when this problem happens. It is returning the episode name and not the title of the series, but after the background content has finished loading the browser's title and the text of the html tag I'm after changes to the title of the series which is what I want.enter image description here

I've been searching google, plus searching here on stackoverflow.com for days. I've been through almost every part of the selenium module trying different things, plus I've been through every part of the webpage hoping to find something that I can use to get selenium to wait for the content to finish loading, with no luck.

Also, it was recommended in an answer from another question to use WebDriverWait with expected_conditions and to try to avoid time.sleep with selenium, and I get that. Even on a high speed internet connection there are a number of things that can slow down the load time of a webpage which would make time.sleep inconsistent.

I started out with the title itself using...

import selenium.webdriver.support.expected_conditions as ec
from selenium.webdriver.support.wait import WebDriverWait

browser = Firefox()
wait = WebDriverWait(driver = browser, timeout = 30)

browser.get('https://curiositystream.com/video/3558')
wait.until_not(ec.title_is(current_title))

But, for the moment I've settled with this. The problem I'm having isn't happening as frequently with this, but the problem is still there.

print(wait.until(
    ec.visibility_of_element_located((
        'xpath',
        '//button[@aria-expanded="false" and @class="inline-block cursor-pointer"]'
        '/span[@class="leading-tight text-lg tablet:text-2xl font-normal" and contains(@aria-label,"Show")]'
    ))).text, end = ' ')

if len(browser.find_elements(
        by = 'xpath',
        value = '//div[@class="pt-4"]/p[@class="font-medium text-light pt-2"]'
)) > 0:
    print('(Docuseries)')
else:
    print('(Documentary)')

This isn't the actual source of what I have written, but it can reproduce the problem I'm having. I'm hoping someone is willing to explore through selenium and Curiosity Stream to help me come up with a solution that works without using time.sleep.

from contextlib import suppress
from os import getpid, kill
from re import compile
from signal import SIGTERM
from time import sleep  # noqa

import selenium.common.exceptions as exc
import selenium.webdriver.support.expected_conditions as ec
from selenium.webdriver import Firefox
from selenium.webdriver.support.wait import WebDriverWait

# from library import Firefox

if __name__ == '__main__':
    browser = Firefox()
    wait = WebDriverWait(driver = browser, timeout = 30)

    browser.set_window_rect(x = 960, y = 10, width = 1920, height = 1580)
    browser.get('https://curiositystream.com/')

    url = compile(r'https://curiositystream.com/video/[0-9]+')

    # current_url = current_title = ''
    current_url, current_title = browser.current_url, browser.title
    try:
        while True:
            if current_url != browser.current_url:
                current_url = browser.current_url

                wait.until_not(ec.title_is(current_title))

                if url.match(string = browser.current_url):
                    current_title = browser.title

                    if (button := next((_ for _ in browser.find_elements(
                            by = 'xpath',
                            value = '//button[@class="vjs-big-play-button" and '
                                    '@type="button" and '
                                    '@title="Play Video" and '
                                    '@aria-disabled="false"]'
                    )), None)) is not None:
                        with suppress(
                                exc.StaleElementReferenceException,
                                exc.ElementNotInteractableException
                        ):
                            button.click()

                    # sleep(wait.__dict__['_poll'])
                    print(wait.until(
                        ec.visibility_of_element_located((
                            'xpath',
                            '//button[@aria-expanded="false" and @class="inline-block cursor-pointer"]'
                            '/span[@class="leading-tight text-lg tablet:text-2xl font-normal" and contains(@aria-label,"Show")]'
                        ))).text, end = ' ')

                    if len(browser.find_elements(
                            by = 'xpath',
                            value = '//div[@class="pt-4"]/p[@class="font-medium text-light pt-2"]'
                    )):
                        print('(Docuseries)')
                    else:
                        print('(Documentary)')
    except exc.NoSuchWindowException:
        kill(getpid(), SIGTERM)


Solution

  • I'm answering this myself for a fix to the problem I was having in case anyone else has a similar issue like this.

    I've been exploring through this, and I found that the problem isn't in the code. The problem is with the permissions of the site itself.

    I chose the option to Allow Video and Audio for the Autoplay in the address bar and the problem I was having stopped.enter image description here