Search code examples
pythonpython-3.xsizebytetqdm

Python - Convert bytes buffer to file size


I'm writing a program that calculates the checksum of a list of files then compares it to a reference file.

I'm trying to convert the bytes buffer from the hashfile method into a file size with the same units as os.stat(path).st_size uses so I can update a tqdm progress bar accordingly. (trying to implement the last example here)

I tried a number of things (len(buf): gives me a processed size far greater than what the total is, int.from_bytes(): OverflowError - int too large to convert to float, struct.unpack_from(buf): requires to read a single byte at a time, various functions to convert bytes) but nothing worked so far. It seems I don't understand bytes enough to know what to search for or to implement the solutions I find.

Here's an excerpt from the code:

import hashlib
import os
from tqdm import tqdm

# calculate total size to process
self.assets_size += os.stat(os.path.join(root, f)).st_size

def hashfile(self, progress, afile, hasher, blocksize=65536):
    """
    Checksum buffer
    :param progress: progress bar object
    :param afile: file to process
    :param hasher: checksum algorithm
    :param blocksize: size of the buffer
    :return: hash digest
    """
    buf = afile.read(blocksize)

    while len(buf) > 0:
        self.processed_size += buf  # need to convert from bytes to file size
        hasher.update(buf)
        progress.update(self.processed_size)  # tqdm update
        buf = afile.read(blocksize)

    afile.seek(0)
    return hasher.digest()

def process_file(self, progress, fichier):
    """
    Checks if the file is in the reference dictionary;
    If so, checks if the size of the file matches the one stored in the dictionary;
    If so, calculates the checksum of the file and compares it to the one in the dictionary
    :param progress: progress bar object
    :param fichier: asset file to process
    :return: string outcome of the process
    """
    checksum = self.hashfile(progress, open(fichier, 'rb'), hashlib.sha1())
    # check if checksum matches
    return outcome

def main_process(self):
    """
    Launches and monitors the process and writes a report of the results
    :return: application end
    """
    with tqdm(total=self.assets_size, unit='B', unit_scale=True) as pbar:
        all_results = []

        for f in self.assets.keys():
            results = self.process_file(pbar, f)
            all_results.append(results)

    for r in all_results:
        print(r)

Solution

  • Found the solution thanks to @RadosławCybulski, I now understand how the tqdm.update() function works: it doesn't set the progress state to the argument, it adds it. I updated the hashfile method like so:

        while len(buf) > 0:
            hasher.update(buf)
            progress.update(len(buf))
            buf = afile.read(blocksize)