I am testing the throughput of writing to S3
from a python glue
shell job by using the upload_fileobj
function from the boto3
client. The input to this function is
Fileobj (a file-like object) -- A file-like object to upload. At a minimum, it must implement the read method, and must return bytes.
In order to have the test isolate just the throughput, as opposed to memory or CPU capabilities, I think the best way to use upload_file_object would be to pass an iterator
that produces N
bytes of the value 0
.
In python, how can a "file like object" be created from an iterator?
I'm looking for something of the form
from itertools import repeat
number_of_bytes = 1024 * 1024
zero_iterator = repeat(b'0', number_of_bytes)
file_like_object = something(zero_iterator) # fill in 'something'
Which would then be passed to boto3 for writing
session.client('s3').upload_fileobj(file_like_object, Bucket='my_bucket')
Thank you in advance for your consideration and response.
This is a simplified version of the answer at https://stackoverflow.com/a/70547492/1319998, since we only need to deal with bytes
, and so should be suitable for boto3's upload_fileobj
def to_file_like_obj(iterable):
chunk = b''
offset = 0
it = iter(iterable)
def up_to_iter(size):
nonlocal chunk, offset
while size:
if offset == len(chunk):
try:
chunk = next(it)
except StopIteration:
break
else:
offset = 0
to_yield = min(size, len(chunk) - offset)
offset = offset + to_yield
size -= to_yield
yield chunk[offset - to_yield:offset]
class FileLikeObj:
def read(self, size=-1):
return b''.join(up_to_iter(float('inf') if size is None or size < 0 else size))
return FileLikeObj()
If you have an iterable that yields bytes, my_iterable
say, this can be used with boto3 as follows:
target_obj = boto3.Session().resource('s3').Bucket('my-target-bucket').Object('my/target/key')
target_obj.upload_fileobj(to_file_like_obj(my_iterable)))