hi i have excel files with youtube url list which i m trying to get their titles as it's full lists of 1000's url's with 3 excel file i tried to work with python but it comes to be too slow as i had to put sleep command on html render codes are like that :
import xlrd
import time
from bs4 import BeautifulSoup
import requests
from xlutils.copy import copy
from requests_html import HTMLSession
loc = ("testt.xls")
wb = xlrd.open_workbook(loc)
sheet = wb.sheet_by_index(0)
wb2 = copy(wb)
sheet.cell_value(0, 0)
for i in range(3,sheet.nrows):
ytlink = (sheet.cell_value(i, 0))
session = HTMLSession()
response = session.get(ytlink)
response.html.render(sleep=3)
print(sheet.cell_value(i, 0))
print(ytlink)
element = BeautifulSoup(response.html.html, "lxml")
media = element.select_one('#container > h1').text
print(media)
s2 = wb2.get_sheet(0)
s2.write(i, 0, media)
wb2.save("testt.xls")
I mean is there anyway to make it faster i tried selenium but it was slower i guess. and with this html.render i seem to need to use "Sleep" timer or else it gives me error i tried lower values on sleep but it gets error after a while on lower sleep values any help please thanks :)
ps: prints i put are just for checking the output and such not important on usage.
You can do 1000 requests in less than a minute using async requests-html like this:
import random
from time import perf_counter
from requests_html import AsyncHTMLSession
urls = ['https://www.youtube.com/watch?v=z9eoubnO-pE'] * 1000
asession = AsyncHTMLSession()
start = perf_counter()
async def fetch(url):
r = await asession.get(url, cookies={'CONSENT': 'YES+cb.20210328-17-p0.en-GB+FX+{}'.format(random.randint(100, 999))})
return r
all_responses = asession.run(*[lambda url=url: fetch(url) for url in urls])
all_titles = [r.html.find('title', first=True).text for r in all_responses]
print(all_titles)
print(perf_counter() - start)
Done in 55s on my laptop.
Note that you need to pass cookies={'CONSENT': 'YES+cb.20210328-17-p0.en-GB+FX+{}'.format(random.randint(100, 999))}
to the request to avoid this issue.