I need to scrape car type video from YouTube by some tags like this list in Google Colab :
Abarth
AC
Acura
Adam
Adler
AEC
Aero
Aixam
Albion
SO i have tried these two code to find the video tag ( for example tag='Peugeot') in google colab:
!pip install youtube-search-python
from youtubesearchpython import SearchVideos
search = SearchVideos("NoCopyrightSounds", offset = 1, mode = "json", max_results = 20)
print(search.result())
and
!pip install youtube-dl
!echo '' > ford_video_list.txt
!chmod 755 ford_video_list.txt
!youtube-dl --match-title 'ford' --add-metadata --write-thumbnail --list-thumbnails --mark-watched --write-info-json 'ford_video_description_json.txt' --write-description 'ford_video_description.txt' --cookies='Search-youtube-url-file.txt' --ignore-errors --skip-download --get-url -f bestvideo+bestaudio/best --default-search "ytsearch2000:" "Ford Festiva" >> ford_video_list.txt
!echo '*****End of test 1 ******'
But by trying this code it don't showing any result:
import urllib.request
from bs4 import BeautifulSoup
textToSearch = 'python tutorials'
query = urllib.parse.quote(textToSearch)
url = "https://www.youtube.com/results?search_query=" + query
response = urllib.request.urlopen(url)
html = response.read()
soup = BeautifulSoup(html, 'html.parser')
for vid in soup.findAll(attrs={'class':'yt-uix-tile-link'}):
if not vid['href'].startswith("https://googleads.g.doubleclick.net/"):
print('https://www.youtube.com' + vid['href'])
So, i guess the class name is not correct!, and i asked here for debugging it.
Update: I have made one google colab page (shown below) to test those codes ( also the code of youtube-dl showing this error:
https://colab.research.google.com/drive/1bZQ68gLggTQHCG_5fQQJJTICHA4K3HJ3?usp=sharing
ERROR: Unable to download webpage: HTTP Error 429: Too Many Requests (caused by <HTTPError 429: 'Too Many Requests'>); please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see https://yt-dl.org/update on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.
I understand that error made because:
The google don't like too many request form one IP Address.
So tried to add these tags(--rm-cache-dir --force-ipv4 --verbose
) to youtube-dl
command as you can see below ( based of these reffrences 1 2 3):
ERROR: Unable to download webpage: HTTP Error 429: Too Many Requests (caused by <HTTPError 429: 'Too Many Requests'>); please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see https://yt-dl.org/update on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.
File "/usr/local/lib/python3.6/dist-packages/youtube_dl/extractor/common.py", line 632, in _request_webpage
return self._downloader.urlopen(url_or_request)
File "/usr/local/lib/python3.6/dist-packages/youtube_dl/YoutubeDL.py", line 2238, in urlopen
return self._opener.open(req, timeout=self._socket_timeout)
File "/usr/lib/python3.6/urllib/request.py", line 532, in open
response = meth(req, response)
File "/usr/lib/python3.6/urllib/request.py", line 642, in http_response
'http', request, response, code, msg, hdrs)
File "/usr/lib/python3.6/urllib/request.py", line 564, in error
result = self._call_chain(*args)
File "/usr/lib/python3.6/urllib/request.py", line 504, in _call_chain
result = func(*args)
File "/usr/lib/python3.6/urllib/request.py", line 756, in http_error_302
return self.parent.open(new, timeout=req.timeout)
File "/usr/lib/python3.6/urllib/request.py", line 532, in open
response = meth(req, response)
File "/usr/lib/python3.6/urllib/request.py", line 642, in http_response
'http', request, response, code, msg, hdrs)
File "/usr/lib/python3.6/urllib/request.py", line 570, in error
return self._call_chain(*args)
File "/usr/lib/python3.6/urllib/request.py", line 504, in _call_chain
result = func(*args)
File "/usr/lib/python3.6/urllib/request.py", line 650, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
Thanks.
It has been working by changing the '"ytsearch2000:" "Ford Festiva"` :
to ' "ytsearch50":"Ford Festiva" as you can see below:
!pip install youtube-dl
# !youtube-dl --default-search gvsearch5:how to develop for android --no-playlist --write-info-json --write-annotation --write-thumbnail --write-sub --skip-download
!youtube-dl --match-title 'ford' "ytsearch50":"Ford Festiva"+"peugeot 405" --write-info-json --write-annotation --write-thumbnail --write-sub -f 'bestvideo[ext=mp4]+bestaudio[ext=m4a]/mp4'
and the problem was because of :
and "
location mistaking!
the entire code for scraping the video of some car type video from google could be seen here: https://colab.research.google.com/github/CAR-Driving/yoloOnGoogleColab/blob/master/database_creating/Yoututbe_scraping_by_colab.ipynb