Search code examples
pythonpython-3.xyoutubebeautifulsoup

Web Scraping youtube with Python 3


I'm doing a project where I need to store the date that a video in youtube was published.
The problem is that I'm having some difficulties trying to find this data in the middle of the HTML source code

Here's my code attempt:

import requests
from bs4 import BeautifulSoup as BS

url = "https://www.youtube.com/watch?v=XQgXKtPSzUI&t=915s"
response = requests.get(url)
soup = BS(response.content, "html.parser")
response.close()

dia = soup.find_all('span',{'class':'date'})
print(dia)

Output:

[]

I know that the arguments I'm sending to .find_all() are wrong.
I'm saying this because I was able to store other information from the video using the same code, such as the title and the views.
I've tried different arguments when using .find_all() but didn't figured out how to find it.


Solution

  • If you use Python with pafy, the object you'll get has the published date easily accessible.

    Install pafy: "pip install pafy"

    import pafy
    vid = pafy.new("www.youtube.com/watch?v=2342342whatever")
    published_date = vid.published
    print(published_date)   #Python3 print statement
    

    Check out the pafy docs for more info: https://pythonhosted.org/Pafy/ The reason I leave the doc link is because it's a really neat module, it handles getting the data without external request modules and also exposes a bunch of other useful properties of the video, like the best format download link, etc.