Search code examples
pythonseleniumweb-scrapingyoutubescreen-scraping

Scraping YouTube links from a webpage


I've been trying to scrape YouTube links from a webpage, but nothing has worked. This is a picture of what I've been trying to scrape.:
This is a picture of what I've been trying to scrape.

This is the code I tried most recently:

youtube_link = soup.find("a", class_="ytp-title-link yt-uix-sessionlink")

And this is the link to the website the YouTube link is in: https://www.electronic-festivals.com/event/i-am-hardstyle-germany


Solution

  • Most of the youtube links are within an iframe and javascript also needs to run. Try using selenium. The following extracts any src or href containing youtube. I only enter the key iframe hosting the youtube clip. You could loop all iframes checking.

    from selenium import webdriver
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.support import expected_conditions as EC
    from selenium.webdriver.common.by import By
    
    def addItems(links, final):
        for link in links:
            ref = link.get_attribute('src') if link.get_attribute('src') is not None else link.get_attribute('href')
            final.append(ref)
        return final
    
    url = "https://www.electronic-festivals.com/event/i-am-hardstyle-germany" 
    driver = webdriver.Chrome()
    driver.get(url)
    driver.switch_to.frame(driver.find_element_by_css_selector('.media-youtube-player'))
    final = []
    
    try:
        links = WebDriverWait(driver, 10).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, "[href*=youtube] , [src*=youtube]")))
        addItems(links, final)
    except:
        pass
    finally:
        driver.switch_to.default_content()
    
    links = driver.find_elements_by_css_selector('[href*=youtube] , [src*=youtube]')
    addItems(links, final)
    
    for link in set(final):
        print(link)
    
    driver.quit()