python beautifulsoup response speech-recognition

Extracting values from Beautiful Soup

I'm quite new to programming and I'm working on a vocal assistant using Python. I found this code on Github but he doesn't works as he should. Here is the code :

def Play(speech):
if speech.endswith("on YouTube"):
    searchTerm = speech.split()
    response = get("https://www.youtube.com/results?search_query=" + quote(" ".join(searchTerm[:-2])))
    soup = BeautifulSoup(response.text, "html.parser")
    videos = soup.findAll(attrs={"class":"yt-uix-tile-link"})[1:4]
    #Was [:3], changed to [1:4] to try to stop ads
    #Try to remove google ads if possible (May have fixed, but test this)
    names = list()
    links = list()
    for i in range(len(videos)):
        names.insert(i, videos[i]["title"])
        links.insert(i, "https://www.youtube.com" + videos[i]["href"])
    print("I found 3 videos. " + ". ".join(names), links)

The URL passed as parameter in the get() method works correctly, the soup variable too, but there is nothing in "videos" so nothing is printed at the end and I don't know how to resolve this.

Some ideas please :) ?

Solution

you cant get the contents of a dynamic website like youtube using requests. sorry to be so direct, but this is the truth.

you need first to get to the url, then render the response using something like chromium in the background, then pass the results to beautiful soup.

the rendering will take 1-2 seconds. this is how its done.

there is a snippet for extracting the dynamic website contents which then are passed to BeautifulSoup

# pip install playwright
from playwright.sync_api import sync_playwright
# after installing you will get prompted
# to install `chromium`, the `thing` i was talking about
from bs4 import BeautifulSoup


def get_dynamic_soup(url: str) -> BeautifulSoup:
    with sync_playwright() as p:
        # Launch the browser
        browser = p.chromium.launch()

        # Open a new browser page
        page = browser.new_page()

        # Open our test file in the opened page
        page.goto(url)

        # Process extracted content with BeautifulSoup
        soup = BeautifulSoup(page.content(), "html.parser")

        browser.close()

        return soup

# quote is defined in your code
_url = "https://www.youtube.com/results?search_query=" + quote(" ".join(searchTerm[:-2]))
soup = get_dynamic_soup(_url)
# now you can do whatever you want with the soup

then you can do your stuff:

videos = soup.findAll(attrs={"class":"yt-uix-tile-link"})[1:4]

to install playwright

python -m pip install playwright # this installs the python package
python -m playwright install # this install the chromium executable

docs for installation

EDIT i found a bug in your code this line

videos = soup.findAll(attrs={"class":"yt-uix-tile-link"})[1:4]

is wrong because you need to specify the HTML element you want to search for

a good example is:

videos = soup.findAll("div", attrs={
    "class": "yt-uix-tile-link"
})[1:4]
# or 
videos = soup.findAll("span", attrs={
    "class": "yt-uix-tile-link"
})[1:4]
# or whatever element it is