Search code examples
pythonmultithreadingyoutube-dlpafy

Multi threading with pafy


I'm trying to use multi threading over multiple pafy instances to fetch a number of video streams. Simplified version of my code:

import pafy
import threading

def get_playurl(url):
    video = pafy.new(url)
    best = video.getbest()
    playurl = best.url
    return playurl

threads = []

for i in range(5):
    t = threading.Thread(target=get_playurl, args=("https://www.youtube.com/watch?v=erG5rgNYSdk&ab_channel=WeezerVEVO",))
    t.start()
    threads.append(t)

for thread in threads:
    thread.join()

Some threads will get the playurl successfully, whilst some others will raise an ImportError:

Exception in thread Thread-2:
Traceback (most recent call last):
  File "C:\Users\my_name\AppData\Local\Programs\Python\Python39\Lib\threading.py", line 954, in _bootstrap_inner
    self.run()
  File "C:\Users\my_name\AppData\Local\Programs\Python\Python39\Lib\threading.py", line 892, in run
    self._target(*self._args, **self._kwargs)
  File "C:\Users\my_name\VSCodeProjects\my_project\stackoverflow_example.py", line 6, in get_playurl
    video = pafy.new(url)
  File "C:\Users\my_name\VSCodeProjects\my_project\.venv\lib\site-packages\pafy\pafy.py", line 122, in new
    from .backend_youtube_dl import YtdlPafy as Pafy
ImportError: cannot import name 'YtdlPafy' from partially initialized module 'pafy.backend_youtube_dl' (most likely due to a circular import) (C:\Users\my_name\VSCodeProjects\my_project\.venv\lib\site-packages\pafy\backend_youtube_dl.py)

Which ones succeed and which raise exceptions seem to be random and differ everytime. I'm new to threading so not sure what the problem here is. I don't have any files with the same names of any of these modules, and I've also installed the youtube-dl dependency + updated all others which hasn't had any effect.

Any help appreciated :)


Solution

  • Pafy apparently uses lazy importing, and when you're using threads you're running into a race condition where two threads are trying to import the submodule at the same time.

    A couple alternatives I can think of:

    • Add import pafy.backend_youtube_dl to your main script after import pafy even if you don't use that import, to ensure it's loaded already.
    • Use multiprocessing for multiple processes instead of threads.