I'm wondering how youtube-dl generate direct link to video. I know that
with youtube-dl --get-url link I can get this, but I want to know how this process goes. (From downloading html page to getting link). Is there a way to check this out?
Youtube-dl is open-source so I guess it is, but I just don't know where specifically I should look.
Thanks in advance
youtube-dl uses classes called InfoExtractor
to make it possible to download videos from different sites. The info extractor for youtube videos is in /youtube_dl/extractor/youtube.py
.
This class is quite complex, since it deals with logging in users and different kinds of videos and channels etc. I think that the relevant part is:
url = proto + '://www.youtube.com/watch?v=%s&gl=US&hl=en&has_verified=1&bpctr=9999999999' % video_id
Where the video_id
is extracted by a big regex:
_VALID_URL = r"""(?x)^
(
(?:https?://|//) # http(s):// or protocol-independent URL
(?:(?:(?:(?:\w+\.)?[yY][oO][uU][tT][uU][bB][eE](?:-nocookie)?\.com/|
(?:www\.)?deturl\.com/www\.youtube\.com/|
(?:www\.)?pwnyoutube\.com/|
(?:www\.)?yourepeat\.com/|
tube\.majestyc\.net/|
youtube\.googleapis\.com/) # the various hostnames, with wildcard subdomains
(?:.*?\#/)? # handle anchor (#/) redirect urls
(?: # the various things that can precede the ID:
(?:(?:v|embed|e)/(?!videoseries)) # v/ or embed/ or e/
|(?: # or the v= param in all its forms
(?:(?:watch|movie)(?:_popup)?(?:\.php)?/?)? # preceding watch(_popup|.php) or nothing (like /?v=xxxx)
(?:\?|\#!?) # the params delimiter ? or # or #!
(?:.*?&)? # any other preceding param (like /?s=tuff&v=xxxx)
v=
)
))
|youtu\.be/ # just youtu.be/xxxx
|(?:www\.)?cleanvideosearch\.com/media/action/yt/watch\?videoId=
)
)? # all until now is optional -> you can pass the naked ID
([0-9A-Za-z_-]{11}) # here is it! the YouTube video ID
(?!.*?&list=) # combined list/video URLs are handled by the playlist IE
(?(1).+)? # if we found the ID, everything can follow
$"""
Luckily it's commented...