Search code examples
pythontqdm

Different behavior using tqdm


I was making a image downloading project for a website, but I encountered some strange behavior using tqdm. In the code below I included two options for making the tqdm progress bar. In option one I did not passed the iteratable content from response into the tqdm directly, while the second option I did. Although the code looks similar, the result is strangely different.

This is what the progress bar's result looks like using Option 1

This is what the progress bar's result looks like using Option 2

Option one is the result I desire but I just couldn't find an explanation for the behavior of using Option 2. Can anyone help me explain this behavior?

import requests
from tqdm import tqdm
import os

# Folder to store in
default_path = "D:\\Downloads"


def download_image(url):
    """

    This function will download the given url's image with proper filename labeling
    If a path is not provided the image will be downloaded to the Downloads folder
    """

    # Establish a Session with cookies
    s = requests.Session()

    # Fix for pixiv's request you have to add referer in order to download images
    response = s.get(url, headers={'User-Agent': 'Mozilla/5.0',
                                   'referer': 'https://www.pixiv.net/'}, stream=True)

    file_name = url.split("/")[-1]  # Retrieve the file name of the link
    together = os.path.join(default_path, file_name)  # Join together path with the file_name. Where to store the file
    file_size = int(response.headers["Content-Length"])  # Get the total byte size of the file
    chunk_size = 1024  # Consuming in 1024 byte per chunk

    # Option 1
    progress = tqdm(total=file_size, unit='B', unit_scale=True, desc="Downloading {file}".format(file=file_name))

    # Open the file destination and write in binary mode
    with open(together, "wb") as f:
        # Loop through each of the chunks in response in chunk_size and update the progres by calling update using
        # len(chunk) not chunk_size
        for chunk in response.iter_content(chunk_size):
            f.write(chunk)

            progress.update(len(chunk))

    # Option 2
    """progress = tqdm(response.iter_content(chunk_size),total=file_size, unit='B', unit_scale=True, desc="Downloading {file}".format(file = file_name))

    with open(together, "wb") as f:

        for chunk in progress:

            progress.update(len(chunk))
            f.write(chunk)

    # Close the tqdm object and file object as good practice
    """

    progress.close()
    f.close()


if __name__ == "__main__":
    download_image("Image Link")

Solution

  • Looks like an existing bug with tqdm. https://github.com/tqdm/tqdm/issues/766

    Option 1:

    • Provides tqdm the total size
    • On each iteration, update progress. Expect the progress bar to keep moving.
    • Works fine.

    Option 2:

    • Provides tqdm the total size along with a generator function that tracks the progress.
    • On each iteration, it should automatically get the update from generator and push the progress bar.
    • However, you also call progress.update manually, which should not be the case.
    • Instead let the generator do the job.
    • But this doesn't work either, and the issue is already reported.

    Suggestion on Option 1: To avoid closing streams manually, you can enclose them inside with statement. Same applies to tqdm as well.

    # Open the file destination and write in binary mode
        with tqdm(total=file_size, 
            unit='B', 
            unit_scale=True, 
            desc="Downloading {file}".format(file=file_name)
        ) as progress, open(file_name, "wb") as f:
            # Loop through each of the chunks in response in chunk_size and update the progres by calling update using
            # len(chunk) not chunk_size
            for chunk in response.iter_content(chunk_size):
                progress.update(len(chunk))
                f.write(chunk)