Can someone help me properly scrape YouTube titles in Python using BS4?

i wanna collect youtube titles from useing BS4 in python. this is code i got recommended by GPT but doesnt work well. im looking for some intelligent coder here. thank you :)

import requests
from bs4 import BeautifulSoup

def get_youtube_titles():
url = 'https://www.youtube.com/'

    try:
        response = requests.get(url)
        response.raise_for_status()
        soup = BeautifulSoup(response.text, 'html.parser')
    
        # Find YouTube title elements
        title_elements = soup.find_all('a', class_='yt-simple-endpoint focus-on-expand style-scope ytd-rich-grid-media')
    
        # Extract and print the titles
        for title_element in title_elements:
            title = title_element.text.strip()
            print(title)
    
    except requests.exceptions.RequestException as e:
        print('Network connection error:', e)

# Get YouTube titles

get_youtube_titles()

I asked to GPT but doesn't work well

Solution

Your code is using requests.get so you'll only get the source html, which is not the same as the fully rendered HTML you might inspect on your browser. For that, you might want to use something that supports JavaScript (like selenium - and don't forget to add in some wait time to allow the page to load....).

However, if all you want are some titles, you can try extracting from the script tags that contain the JavaScript with the following functions:

# import json

## a general function for extracting a JavaScript variable from a bs4 object
def get_jsScriptVal(jSoup, valDecl, isJson=True):
    script_finder = lambda s: s and valDecl in s
    for sc in jSoup.find('script', string=script_finder):
        for st in  sc.string.split(';'):
            ls, rs, *_ = [s.strip() for s in (st.split('=', 1) + [''])]
            if ls == valDecl and rs: return json.loads(rs) if isJson else rs


## specifically for your case
def get_ytInitialTitles(ySoup):
    contents = get_jsScriptVal(ySoup, 'var ytInitialData')['contents']
    tab1 = contents['twoColumnBrowseResultsRenderer']['tabs'][0]
    contents = tab1['tabRenderer']['content']['richGridRenderer']['contents']
    contents = [c['richItemRenderer']['content']['videoRenderer'] 
                for c in contents if 'richItemRenderer' in c and 
                'videoRenderer' in c['richItemRenderer']['content']]
    titles = [c['title']['runs'][0]['text'] for c in contents]
    return titles

Now, if you edit your code to use the functions above:

import requests
from bs4 import BeautifulSoup
import json

#### DON'T FORGET TO PASTE THE FUNCTION DEFINITIONS INTO YOUR CODE TOO ####
## def get_jsScriptVal....
## def get_ytInitialTitles....
##########################################################################

def get_youtube_titles():
    url = 'https://www.youtube.com/'
    try:
        response = requests.get(url)
        response.raise_for_status()
        soup = BeautifulSoup(response.text, 'html.parser')
    
        # titles = get_ytInitialTitles(soup) # Find YouTube title elements
        # for title in titles: print(title) # Extract and print the titles

        # OR [in one line]
        for title in get_ytInitialTitles(soup): print(title)
    
    except Exception as e:
        print('Failed to scrape due to', type(e), ':', e)

get_youtube_titles()

then it should print something like

Survive 100 Days In Circle, Win $500,000
lofi hip hop radio 📚 - beats to relax/study to
Spectaculair ingekleurde film over het begin van de Duitse bezetting van Nederland tijdens WOII
Omtzigt is WOEST & SLOOPT liegende Rutte! 'Kijk die ouders in hun ogen!'
Ineens vielen er bommen op zonnepanelen... Algemene beschouwingen Venlo 2023
Trump Opens Up on Secret White House Documents, Biden Family & Republican Opponents | Trump LIVE
An einem Tag nach Mallorca und zurück: Was verdient ein Flugbegleiter? | Lohnt sich das | BR
I BUILT A SHELTER IN THE FOREST!! AND LIVED THERE FOR 2 MONTHS!
De halvering van China
Ibiza Summer Mix 2023 🍓 Best Of Tropical Deep House Music Chill Out Mix 2023🍓 Chillout Lounge #153
Tibetaanse Genezende Fluit • Afgifte van melatonine en gifstoffen • Elimineer stress en kalmeer ...
Alle 200 POTLODEN GEBRUIKEN in 1 TEKENING - Tekenen Challenge
Top 10 BEST Auditions on BGT 2023!
Ontspannende muziek tot opluchting stress, angst en depressie 🐬 Verzachtende muziek voor zenuwen
6 juni 1944, D-Day, Operatie Overlord | Ingekleurd
Ed Sheeran, Martin Garrix, Kygo, Dua Lipa, Avicii, Robin Schulz, The Chainsmokers Style - Feeling Me
DIY with Mr Bean | Full Episodes | Classic Mr Bean
EEN WEDSTRIJD VOL AFSCHEID! 😭🫡 | Barcelona vs Mallorca | La Liga 2022/23 | Samenvatting
Deep Focus Music To Improve Concentration - 12 Hours of Ambient Study Music to Concentrate #506
The Inside Guys React To The Miami Heat's Blowout Game 7 Win In Boston | NBA on TNT
Muziek genezen om stress, vermoeidheid, depressie, negativiteit, detoxemoties te verlichten
How Rain Caused Havoc And Changed The Race | 2023 Monaco Grand Prix