Search code examples
pythongoogle-cloud-platformgoogle-cloud-storage

Limit download rate from google cloud storage (python library)


I would like to be able to limit the rate of a blob download from google cloud storage in Python.

I could not find any indication that is possible using the official Python library or the alternative GCSFS library.

My best guess so far would be to implement it by downloading slices of the blob using download_as_bytes() start and end arguments and control for timing between slice requests, but 1) I would prefer if possible a built-in solution and 2) I am not sure this would be the best solution.

Does anybody have a built-in solution or a better approach?


Solution

  • A slightly modified version from Herlandro Hermogenes answer (https://stackoverflow.com/a/79067945/27700804)

    import time
    from google.cloud import storage
    
    def download_blob_rate_limited(blob, dest_file, rate_limit=512*1024, freq=10):
        """
        Download a blob from Google Cloud with a rate limit.
        
        Arguments:
        blob       -- the blob to download
        dest_file  -- destination file to write
        rate_limit -- the rate limit in B/sec (default: 512*1024 = 512 KB/sec)
        freq       -- the frequency at which the rate limit is enforced, in (sec)-1 (default: 10 times per second)
        """
        chunk_size = int(rate_limit/freq)
        
        with open(dest_file, 'wb') as file_obj:
            start = 0
            blob_size = blob.size
            while start < blob_size:
                chunk_start_time = time.time()
                end = min(start + chunk_size, blob_size)
                chunk = blob.download_as_bytes(start=start, end=end)
                file_obj.write(chunk)
                chunk_download_time = time.time() - chunk_start_time
                if(chunk_download_time < 1/freq):
                    time.sleep(1/freq - chunk_download_time)
                start = end + 1
    
    # Usage example
    download_blob_rate_limited('my-blob', 'local_file.txt', rate_limit=10*1024*1024)
        
    

    I find it easier to specify how many times per second I want to download chunks (and thus on what frequency I want to enforce the rate limit) rather than figuring out the chunk size to achieve that.

    Also, the sleep times have been modified to better match the target download rate. With this version, I could download at 9.6 MB/sec for a set target rate of 10 MB/sec (target rate much lower than the connection capacity, which is >50 MB/sec): the overhead is not that big!