Search code examples
pythonpython-3.xmultithreadingconcurrent.futures

concurrent.futures multithreading with 2 lists as variables


So I would like to multi-thread the following working piece of code with concurrent futures but nothing I've tried so far seems to work.

def download(song_filename_list, song_link_list):

    with requests.Session() as s:
    
        login_request = s.post(login_url, data= payload, headers= headers)

        for x in range(len(song_filename_list)):

            download_request = s.get(song_link_list[x], headers= download_headers, stream=True)

            if download_request.status_code == 200:
                print(f"Downloading {x+1} out of {len(song_filename_list)}!\n")
                pass
            else:
                print(f"\nStatus Code: {download_request.status_code}!\n")
                sys.exit()

            
            with open (song_filename_list[x], "wb") as file:
                file.write(download_request.content)

The 2 main variables are the song_filename_list and the song_link_list.

The first list has names of each file and the second has all their respective download links.
So the name and link of each file are located at the same position.
For example: name_of_file1 = song_filename_list[0] and link_of_file1 = song_link_list[0]


This is the most recent attempt at multi-threading:

def download(song_filename_list, song_link_list):

    with requests.Session() as s:
    
        login_request = s.post(login_url, data= payload, headers= headers)

        x = []
        for i in range(len(song_filename_list)):
            x.append(i)


        with concurrent.futures.ThreadPoolExecutor() as executor:
            executor.submit(get_file, x)


def get_file(x):
    
    download_request = s.get(song_link_list[x], headers= download_headers, stream=True)

    if download_request.status_code == 200:
        print(f"Downloading {x+1} out of {len(song_filename_list)}!\n")
        pass
    else:
        print(f"\nStatus Code: {download_request.status_code}!\n")
        sys.exit()

        
    with open (song_filename_list[x], "wb") as file:
        file.write(download_request.content)

Could someone explain to me what am I doing wrong?
Cause nothing happens after the get_file function call.
It skips all the code and exits without any errors, so where is my logic wrong?


EDIT 1:

After adding prints to:

print(song_filename_list, song_link_list)
        with concurrent.futures.ThreadPoolExecutor() as executor:
            print("Before executor.map")
            executor.map(get_file, zip(song_filename_list, song_link_list))
            print("After executor.map")
            print(song_filename_list, song_link_list)

And to the start and end get_file and its file.write.

The output is as follows:


Succesfully logged in!

["songs names"] ["songs links"]    <- These are correct.
Before executor.map
After executor.map
["songs names"] ["songs links"]    <- These are correct.

Exiting.

In other words values are correct but it skips the get_file in the executor.map.


EDIT 2:

Here are the values used.

  • song_filename_list = ['100049 Himeringo - Yotsuya-san ni Yoroshiku.osz', '1001507 ZUTOMAYO - Kan Saete Kuyashiiwa.osz']

  • song_link_list = ['https://osu.ppy.sh/beatmapsets/100049/download', 'https://osu.ppy.sh/beatmapsets/1001507/download']


EDIT 3:

After some tinkering around it would seem that this works.

for i in range(len(song_filename_list)):
    with concurrent.futures.ThreadPoolExecutor() as executor:
        executor.submit(get_file, song_filename_list, song_link_list, i, s)
def get_file(song_filename_list, song_link_list, i, s):
    
    download_request = s.get(song_link_list[i], headers= download_headers, stream=True)

    if download_request.status_code == 200:
        print("Downloading...")
        pass
    else:
        print(f"\nStatus Code: {download_request.status_code}!\n")
        sys.exit()
    
    with open (song_filename_list[i], "wb") as file:
        file.write(download_request.content)

Solution

  • In your download() function you submit the whole array while you should submit each items:

    def download(song_filename_list, song_link_list):
        with requests.Session() as s:
            login_request = s.post(login_url, 
                data=payload, 
                headers=headers)
    
            for i in range(len(song_filename_list)):
                with concurrent.futures.ThreadPoolExecutor() as executor:
                    executor.submit(get_file, i)
    

    You can simplify this with the executor .map() method:

    def download(song_filename_list, song_link_list):
      with requests.Session() as session:
        session.post(login_url, 
            data=payload, 
            headers=headers)
    
      with concurrent.futures.ThreadPoolExecutor() as executor:
        executor.map(get_file, song_filename_list, song_link_list)
    

    Where the get_file function is:

    def get_file(song_name, song_link):
      with requests.Session() as session:
        download_request = session.get(song_link, 
            headers=download_headers, 
            stream=True)
    
      if download_request.status_code == 200:
        print(f"Downloaded {song_name}")
      else:
        print(f"\nStatus Code: {download_request.status_code}!\n")
      
      with open(song_name, "wb") as file:
        file.write(download_request.content)
    

    This avoid sharing state between threads, which avoids potential data races.

    If you need to monitor how much songs have been downloaded, you can use tqdm which has a thread_map iterator wrapper that does exactly this.