Search code examples
pythonboto3python-asynciopython-contextvars

Trouble with async context and structuring my code


I am using a library that requires async context(aioboto3). My issue is that I can't call methods from outside the async with block on my custom S3StreamingFile instance. If I do so, python raises an exception, telling me that HttpClient is None.

I want to access the class methods of S3StreamingFile from an outer function, for example in a API route. I don't want to return anything more(from file_2.py) than the S3StreamingFile class instance to the caller(file_3.py). The aioboto3 related code can't be moved to file_3.py. file_1.py and file_2.py need to contain the aioboto3 related logic.

How can I solve this?

Example of not working code:

# file_1.py
class S3StreamingFile():
    def __init__(self, s3_object):
        self.s3_object = s3_object
    async def size(self):
        return await self.s3_object.content_length # raises exception, HttpClient is None
...

# file_2.py
async def get_file():
    async with s3.resource(...) as resource:
        s3_object = await resource.Object(...)
        s3_file = S3StreamingFile(s3_object)
        return s3_file

# file_3.py
async def main()
    s3_file = await get_file()
    size = await s3_file.size() # raises exception, HttpClient is None

Example of working code:

# file_1.py
class S3StreamingFile():
    def __init__(self, s3_object):
        self.s3_object = s3_object
    async def size(self):
        return await self.s3_object.content_length
...

# file_2.py
async def get_file():
    async with s3.resource(...) as resource:
        s3_object = await resource.Object(...)
        s3_file = S3StreamingFile(s3_object)
        size = await s3_file.size() # works OK here, HttpClient is available
        return s3_file

# file_3.py
async def main()
    s3_file = await get_file()

Solution

  • I want to access the class methods from an outer function... how do I solve this?

    Don't. This library is using async context managers to handle resource acquisition/release. The whole point about the context manager is that things like s3_file.size() only make sense when you have acquired the relevant resource (here the s3 file instance).

    But how do you use this data in the rest of your program? In general---since you haven't said what the rest of your program is or why you want this data---there are two approaches:

    • acquire the resource somewhere else, and then make it available in much larger scopes, or
    • make your other functions resource-aware.

    In the first case, you'd acquire the resource before all the logic runs, and then hold on to it. (This might look like RAII.) This might well make sense in smaller scripts, or when a resource is designed to be held by only one process at a time. It's a poor fit for code which will spend most of its time doing nothing, or has to coexist with other users of the resource. (An extension of this is writing your own code as a context manager, and effectively moving the problem up the calling stack. If each code path only handles one resource, this might well be the way to go.)

    In the second, you'd write your higher-level functions to be aware that they're accessing a resource. You might do this by passing the resource itself around:

    def get_file(resource: AcquiredResource) -> FileResource:
        ...
    
    def get_size(thing: AcquirableResource) -> int:
        with thing as resource:
            s3_file = get_file(resource)
            return s3_file.size
    

    (using made-up generic types here to illustrate the point).

    Or you might want a static copy of (some) attrs of a particular resource, like a file here, and a step where you build that copy. Personally I would likely store those in a dict or local object to make it clear that I wasn't handling the resource itself.

    The basic idea here is that the with block guards access to a potentially difficult-to-acquire resource. That safety is built into the library, but it comes at the cost of having to think about the acquisition and structure it into the flow of your code.