Search code examples
pythonpython-3.xiteratorboto3

How To Create a Python file-like Object from an Iterator


I am testing the throughput of writing to S3 from a python glue shell job by using the upload_fileobj function from the boto3 client. The input to this function is

Fileobj (a file-like object) -- A file-like object to upload. At a minimum, it must implement the read method, and must return bytes.

In order to have the test isolate just the throughput, as opposed to memory or CPU capabilities, I think the best way to use upload_file_object would be to pass an iterator that produces N bytes of the value 0.

In python, how can a "file like object" be created from an iterator?

I'm looking for something of the form

from itertools import repeat

number_of_bytes = 1024 * 1024

zero_iterator = repeat(b'0', number_of_bytes)

file_like_object = something(zero_iterator) # fill in 'something'

Which would then be passed to boto3 for writing

session.client('s3').upload_fileobj(file_like_object, Bucket='my_bucket')

Thank you in advance for your consideration and response.


Solution

  • This is a simplified version of the answer at https://stackoverflow.com/a/70547492/1319998, since we only need to deal with bytes, and so should be suitable for boto3's upload_fileobj

    def to_file_like_obj(iterable):
        chunk = b''
        offset = 0
        it = iter(iterable)
    
        def up_to_iter(size):
            nonlocal chunk, offset
    
            while size:
                if offset == len(chunk):
                    try:
                        chunk = next(it)
                    except StopIteration:
                        break
                    else:
                        offset = 0
                to_yield = min(size, len(chunk) - offset)
                offset = offset + to_yield
                size -= to_yield
                yield chunk[offset - to_yield:offset]
    
        class FileLikeObj:
            def read(self, size=-1):
                return b''.join(up_to_iter(float('inf') if size is None or size < 0 else size))
    
        return FileLikeObj()
    

    If you have an iterable that yields bytes, my_iterable say, this can be used with boto3 as follows:

    target_obj = boto3.Session().resource('s3').Bucket('my-target-bucket').Object('my/target/key')
    target_obj.upload_fileobj(to_file_like_obj(my_iterable)))