I would like to be able to limit the rate of a blob download from google cloud storage in Python.
I could not find any indication that is possible using the official Python library or the alternative GCSFS library.
My best guess so far would be to implement it by downloading slices of the blob using download_as_bytes() start
and end
arguments and control for timing between slice requests, but 1) I would prefer if possible a built-in solution and 2) I am not sure this would be the best solution.
Does anybody have a built-in solution or a better approach?
A slightly modified version from Herlandro Hermogenes answer (https://stackoverflow.com/a/79067945/27700804)
import time
from google.cloud import storage
def download_blob_rate_limited(blob, dest_file, rate_limit=512*1024, freq=10):
"""
Download a blob from Google Cloud with a rate limit.
Arguments:
blob -- the blob to download
dest_file -- destination file to write
rate_limit -- the rate limit in B/sec (default: 512*1024 = 512 KB/sec)
freq -- the frequency at which the rate limit is enforced, in (sec)-1 (default: 10 times per second)
"""
chunk_size = int(rate_limit/freq)
with open(dest_file, 'wb') as file_obj:
start = 0
blob_size = blob.size
while start < blob_size:
chunk_start_time = time.time()
end = min(start + chunk_size, blob_size)
chunk = blob.download_as_bytes(start=start, end=end)
file_obj.write(chunk)
chunk_download_time = time.time() - chunk_start_time
if(chunk_download_time < 1/freq):
time.sleep(1/freq - chunk_download_time)
start = end + 1
# Usage example
download_blob_rate_limited('my-blob', 'local_file.txt', rate_limit=10*1024*1024)
I find it easier to specify how many times per second I want to download chunks (and thus on what frequency I want to enforce the rate limit) rather than figuring out the chunk size to achieve that.
Also, the sleep times have been modified to better match the target download rate. With this version, I could download at 9.6 MB/sec for a set target rate of 10 MB/sec (target rate much lower than the connection capacity, which is >50 MB/sec): the overhead is not that big!