Search code examples
pythonthreadpool

May I use sequence number in Python Pool.imap?


I use code as below to download ts files, because original ts filename is too long, that is failed when merge ts files to mp4 file, so I want to save it in sequence 001.ts, 02.ts,...and so on, how can I pass this sequence number to download_ts_file?

def download_ts_file(ts_url: str, store_dir: str, attempts: int = 10):
    ts_fname = ts_url.split('/')[-1]
    ts_dir = os.path.join(store_dir, ts_fname)
    ts_res = None

    for _ in range(attempts):
        try:
            ts_res = requests.get(ts_url, headers=header)
            if ts_res.status_code == 200:
                break
        except Exception:
            pass
        time.sleep(.5)

    if isinstance(ts_res, Response) and ts_res.status_code == 200:
        with open(ts_dir, 'wb+') as f:
            f.write(ts_res.content)
    else:
        print(f"Failed to download streaming file: {ts_fname}.")

pool = Pool(20)
gen = pool.imap(partial(download_ts_file, store_dir='.'), ts_url_list)
for _ in tqdm.tqdm(gen, total=len(ts_url_list)):
    pass
pool.close()
pool.join()

Solution

  • Use enumerate to pass in tuples of sequence, url instead of just the url.

    def download_ts_file(seq_and_ts_url: tuple[int, str], store_dir: str, attempts: int = 10):
        seq, ts_url = seq_and_ts_url
        # seq will be 0, 1, 2, ...
    
    
    with Pool(20) as p:
        gen = pool.imap(partial(download_ts_file, store_dir="."), enumerate(ts_url_list))
        for _ in tqdm.tqdm(gen, total=len(ts_url_list)):
            pass