Search code examples
pythongarbage-collectionpython-asynciopython-contextvars

ContextVar MemoryLeak


The following code has a memory leak and I don't understand why there are references to MyObj. run(1) and run(2) are finished, the context is cleared.

import asyncio
import gc
from contextvars import ContextVar

import objgraph

ctx = ContextVar('ctx')

class MyObj:
    def __init__(self, value):
        self.value = value


async def run(value):
    ctx.set(MyObj(value))

async def main():
    await asyncio.gather(run(1), run(2))
    gc.collect()
    print('# of MyObj=', objgraph.count('MyObj'))
    for obj in objgraph.by_type('MyObj'):
        print('MyObj.value-',obj.value)
    print('outer ctx=', ctx.get(None))
    objgraph.show_backrefs(objgraph.by_type('MyObj'), filename='ctx.png', max_depth=5)

asyncio.run(main())
# of MyObj= 2
MyObj.value= 2
MyObj.value= 1
outer ctx= None

objgraph

Why are there still 'finished tasks' even though I manually called gc.collect()?

Similar questions


Solution

  • Basically in your case the finished tasks weren't cleaned. You can see reference to them in asyncio.tasks._all_tasks:

    from asyncio import tasks
    
    async def main():
    
        task1, task2 = run(1), run(2)
        await asyncio.gather(task1, task2)
        
        futures = tasks._all_tasks
        for future in futures:
            print(future, future.done())
    

    This code will output three tasks:

    <Task finished name='Task-2' coro=<run() done...> True
    <Task finished name='Task-3' coro=<run() done...> True
    <Task pending name='Task-1' coro=<main() running ...> False
    

    So, because tasks still referenced in tasks._all_tasks the context wasn't cleaned. I guess this is one of the examples where Python memory management is not very efficient. You may solve this problem in one of the following ways:

    Solution 1
    You can add additional async short sleep, this will allow to collect the garbage from the memory.

    async def main():
        await asyncio.gather(run(1), run(2))
        await asyncio.sleep(-0.01)
        gc.collect()
    

    Solution 2
    Another option is to do additional 'wrapper' for 'asyncio.gather'. This will also allow to properly clean the memory in the context.

    import asyncio
    import gc
    from contextvars import ContextVar
    
    import objgraph
    
    ctx = ContextVar('ctx')
    
    class MyObj:
        def __init__(self, value):
            self.value = value
    
    
    async def run(value):
        ctx.set(MyObj(value))
    
    async def both_run():
        task1 = asyncio.create_task(run(1))
        task2 = asyncio.create_task(run(2))
        await asyncio.gather(task1, task2)
        
    
    async def main():
        task = asyncio.create_task(both_run())
        await asyncio.gather(task)
    
        print('# of MyObj=', objgraph.count('MyObj'))
        for obj in objgraph.by_type('MyObj'):
            print('MyObj.value-',obj.value)
        print('outer ctx=', ctx.get(None))
        objgraph.show_backrefs(objgraph.by_type('MyObj'), filename='ctx.png', max_depth=5)
    
    asyncio.run(main())