i wanna collect youtube titles from useing BS4 in python. this is code i got recommended by GPT but doesnt work well. im looking for some intelligent coder here. thank you :)
import requests
from bs4 import BeautifulSoup
def get_youtube_titles():
url = 'https://www.youtube.com/'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
# Find YouTube title elements
title_elements = soup.find_all('a', class_='yt-simple-endpoint focus-on-expand style-scope ytd-rich-grid-media')
# Extract and print the titles
for title_element in title_elements:
title = title_element.text.strip()
except requests.exceptions.RequestException as e:
print('Network connection error:', e)
# Get YouTube titles
I asked to GPT but doesn't work well
Your code is using requests.get
so you'll only get the source html, which is not the same as the fully rendered HTML you might inspect on your browser. For that, you might want to use something that supports JavaScript (like selenium - and don't forget to add in some wait time to allow the page to load....).
However, if all you want are some titles, you can try extracting from the script
tags that contain the JavaScript with the following functions:
# import json
## a general function for extracting a JavaScript variable from a bs4 object
def get_jsScriptVal(jSoup, valDecl, isJson=True):
script_finder = lambda s: s and valDecl in s
for sc in jSoup.find('script', string=script_finder):
for st in sc.string.split(';'):
ls, rs, *_ = [s.strip() for s in (st.split('=', 1) + [''])]
if ls == valDecl and rs: return json.loads(rs) if isJson else rs
## specifically for your case
def get_ytInitialTitles(ySoup):
contents = get_jsScriptVal(ySoup, 'var ytInitialData')['contents']
tab1 = contents['twoColumnBrowseResultsRenderer']['tabs'][0]
contents = tab1['tabRenderer']['content']['richGridRenderer']['contents']
contents = [c['richItemRenderer']['content']['videoRenderer']
for c in contents if 'richItemRenderer' in c and
'videoRenderer' in c['richItemRenderer']['content']]
titles = [c['title']['runs'][0]['text'] for c in contents]
return titles
Now, if you edit your code to use the functions above:
import requests
from bs4 import BeautifulSoup
import json
## def get_jsScriptVal....
## def get_ytInitialTitles....
def get_youtube_titles():
url = 'https://www.youtube.com/'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
# titles = get_ytInitialTitles(soup) # Find YouTube title elements
# for title in titles: print(title) # Extract and print the titles
# OR [in one line]
for title in get_ytInitialTitles(soup): print(title)
except Exception as e:
print('Failed to scrape due to', type(e), ':', e)
then it should print something like
Survive 100 Days In Circle, Win $500,000 lofi hip hop radio 📚 - beats to relax/study to Spectaculair ingekleurde film over het begin van de Duitse bezetting van Nederland tijdens WOII Omtzigt is WOEST & SLOOPT liegende Rutte! 'Kijk die ouders in hun ogen!' Ineens vielen er bommen op zonnepanelen... Algemene beschouwingen Venlo 2023 Trump Opens Up on Secret White House Documents, Biden Family & Republican Opponents | Trump LIVE An einem Tag nach Mallorca und zurück: Was verdient ein Flugbegleiter? | Lohnt sich das | BR I BUILT A SHELTER IN THE FOREST!! AND LIVED THERE FOR 2 MONTHS! De halvering van China Ibiza Summer Mix 2023 🍓 Best Of Tropical Deep House Music Chill Out Mix 2023🍓 Chillout Lounge #153 Tibetaanse Genezende Fluit • Afgifte van melatonine en gifstoffen • Elimineer stress en kalmeer ... Alle 200 POTLODEN GEBRUIKEN in 1 TEKENING - Tekenen Challenge Top 10 BEST Auditions on BGT 2023! Ontspannende muziek tot opluchting stress, angst en depressie 🐬 Verzachtende muziek voor zenuwen 6 juni 1944, D-Day, Operatie Overlord | Ingekleurd Ed Sheeran, Martin Garrix, Kygo, Dua Lipa, Avicii, Robin Schulz, The Chainsmokers Style - Feeling Me DIY with Mr Bean | Full Episodes | Classic Mr Bean EEN WEDSTRIJD VOL AFSCHEID! 😭🫡 | Barcelona vs Mallorca | La Liga 2022/23 | Samenvatting Deep Focus Music To Improve Concentration - 12 Hours of Ambient Study Music to Concentrate #506 The Inside Guys React To The Miami Heat's Blowout Game 7 Win In Boston | NBA on TNT Muziek genezen om stress, vermoeidheid, depressie, negativiteit, detoxemoties te verlichten How Rain Caused Havoc And Changed The Race | 2023 Monaco Grand Prix