I'm quite new to programming and I'm working on a vocal assistant using Python. I found this code on Github but he doesn't works as he should. Here is the code :
def Play(speech):
if speech.endswith("on YouTube"):
searchTerm = speech.split()
response = get("https://www.youtube.com/results?search_query=" + quote(" ".join(searchTerm[:-2])))
soup = BeautifulSoup(response.text, "html.parser")
videos = soup.findAll(attrs={"class":"yt-uix-tile-link"})[1:4]
#Was [:3], changed to [1:4] to try to stop ads
#Try to remove google ads if possible (May have fixed, but test this)
names = list()
links = list()
for i in range(len(videos)):
names.insert(i, videos[i]["title"])
links.insert(i, "https://www.youtube.com" + videos[i]["href"])
print("I found 3 videos. " + ". ".join(names), links)
The URL passed as parameter in the get() method works correctly, the soup variable too, but there is nothing in "videos" so nothing is printed at the end and I don't know how to resolve this.
Some ideas please :) ?
you cant
get the contents of a dynamic website like youtube
using requests. sorry to be so direct, but this is the truth.
you need first to get
to the url, then render the response using something like chromium
in the background, then pass the results to beautiful soup.
the rendering will take 1-2 seconds. this is how its done.
there is a snippet for extracting the dynamic website contents which then are passed to BeautifulSoup
# pip install playwright
from playwright.sync_api import sync_playwright
# after installing you will get prompted
# to install `chromium`, the `thing` i was talking about
from bs4 import BeautifulSoup
def get_dynamic_soup(url: str) -> BeautifulSoup:
with sync_playwright() as p:
# Launch the browser
browser = p.chromium.launch()
# Open a new browser page
page = browser.new_page()
# Open our test file in the opened page
page.goto(url)
# Process extracted content with BeautifulSoup
soup = BeautifulSoup(page.content(), "html.parser")
browser.close()
return soup
# quote is defined in your code
_url = "https://www.youtube.com/results?search_query=" + quote(" ".join(searchTerm[:-2]))
soup = get_dynamic_soup(_url)
# now you can do whatever you want with the soup
then you can do your stuff:
videos = soup.findAll(attrs={"class":"yt-uix-tile-link"})[1:4]
python -m pip install playwright # this installs the python package
python -m playwright install # this install the chromium executable
docs for installation
EDIT i found a bug in your code this line
videos = soup.findAll(attrs={"class":"yt-uix-tile-link"})[1:4]
is wrong because you need to specify the HTML element you want to search for
a good example is:
videos = soup.findAll("div", attrs={
"class": "yt-uix-tile-link"
})[1:4]
# or
videos = soup.findAll("span", attrs={
"class": "yt-uix-tile-link"
})[1:4]
# or whatever element it is