I am downloading and converting multiple YouTube videos to audio only files on my RPi Zero. While the init and first download take quite some time, subsequent downloads are way faster. Is there any way to "warm up" yt-dl to be faster even for the first download? I don't mind any additional initialization time. (Changing the order of the URLs has no effect.)
import time
t1 = time.time()
from youtube_dl import YoutubeDL
ydl = YoutubeDL({'format': 'bestaudio/best'})
t2 = time.time()
print(t2 - t1, flush=True)
ydl.download(['https://www.youtube.com/watch?v=xxxxxxxxxxx'])
t3 = time.time()
print(t3 - t2, flush=True)
ydl.download(['https://www.youtube.com/watch?v=yyyyyyyyyyy'])
t4 = time.time()
print(t4 - t3, flush=True)
ydl.download(['https://www.youtube.com/watch?v=zzzzzzzzzzz',])
t5 = time.time()
print(t5 - t4, flush=True)
Output:
5.889932870864868
[youtube] xxxxxxxxxxx: Downloading webpage
[download] 100% of 4.09MiB in 00:01
15.685529470443726
[youtube] yyyyyyyyyyy: Downloading webpage
[download] 100% of 3.58MiB in 00:00
2.526634693145752
[youtube] zzzzzzzzzzz: Downloading webpage
[download] 100% of 3.88MiB in 00:01
2.4716105461120605
After stepping through the youtube-dl code I found that most of the time is consumed for finding the correct InfoExtractor
for YT URLs. When downloading the first media item the framework goes through hundreds of possible extractors (with regular expression execution on each one) and finally settles at the correct YT extractor which is on position 1122 in my case!
This is my quick hack that completely removes 12 seconds from the process on my RPi Zero:
import time
timer = time.time()
from youtube_dl import YoutubeDL
ydl = YoutubeDL({'format': 'bestaudio/best'})
# Get correct info extractor and replace the long existing list
ydl._ies = [ydl.get_info_extractor('Youtube')]
print(time.time() - timer)
timer = time.time()
# Super fast first download, yay!
ydl.download(['https://www.youtube.com/watch?v=xxxxxxxxxxx'])
print(time.time() - timer)
Output:
5.961918592453003
[youtube] xxxxxxxxxxx: Downloading webpage
[download] 100% of 4.09MiB in 00:01
3.7426917552948 <-- way faster!
Maybe there is a more regular way for this without overwriting semi private variables.