Search code examples
pythonpython-3.xweb-scrapingbeautifulsoupyoutube

Scraping New YouTube Videos With BeautifulSoup


I'm new to python and I'm wanting to get into web scraping on YouTube. I'm wanting to use this link to get the newest videos uploaded: 'https://www.youtube.com/results?search_query=programming&sp=CAISBAgBEAE%253D' and I want to scrape the new 5 videos. How can I do this? I've used this piece of code to test it (I only want the links) from this question

from bs4 import BeautifulSoup
import requests

url="https://www.youtube.com/results?search_query=programming&sp=CAISBAgBEAE%253D"
html = requests.get(url)
soup = BeautifulSoup(html.text, features="html.parser") 

for entry in soup.find_all("entry"):
    for link in entry.find_all("link"):
        print(link["href"])

Edit: I don't get any response from the python terminal. It's not scraping anything. It only has the default ">>>".


Solution

  • You cannot scrape YouTube without using Googles YouTube API key which you can get by doing these steps. I can repost a legit answer to your question if you're still down to try.

    In the mean time, try practicing your parsing with beautifulsoup on this website videvo.net

    Here's some code to help you get started

    def get_source(url):
        return BeautifulSoup(requests.get(url, headers={"User-Agent": "Mozilla/5.0"}, verify=False).text, 'html.parser')
    
    soup = get_source('http://videvo.net')
    
    for tags in soup.find_all('a'):
       print(tags['href'])
    

    EDIT I stand corrected (slightly). Youtube's main url cannot be parsed. You can try this code

    def get_source(url):
        return BeautifulSoup(requests.get(url, headers={"User-Agent": "Mozilla/5.0"}, verify=False).text, 'html.parser')
    
    soup = get_source('https://www.youtube.com/feeds/videos.xml?user=kinagrannis')
    
    for entry in soup.find_all("entry"):
       for title in entry.find_all("title"):
          print(title.text)
       for link in entry.find_all("link"):
          print(link["href"])
       for name in entry.find_all("name"):
          print(name.text)
       for pub in entry.find_all("published"):
          print(pub.text)
    

    note: you can put any user name instead of 'kinnagrannis', user=[username]