Search code examples
pythonmemorymemory-leaksgarbage-collectionpyxb

Under what circumstances could an issue that looks like a python memory leak not be a leak?


We have a python script using PyXB and other libraries that processes large quantities of both XML and JSON data, and this script consumes an ever increasing amount of RAM, until the machine runs out of memory.

Apart from a memory leak, are there other circumstances that could cause this high memory usage?


Solution

  • In our case the cause of what looked like a leak was our python code consuming RAM faster than the python garbage collector was willing to clean up the garbage.

    The solution in our case was to force a manual garbage collection at the end of each unit of work in our script, as follows:

    gc.collect()
    

    This brought memory under control.

    Proving that the particular code that seemed to be leaking wasn't leaking was confirmed with the tracemalloc library. The garbage was collected, snapshots were taken, and the snapshots then compared before and after to prove that no additional memory was being allocated.

    for _ in range(10000):
    
        gc.collect();
        snapshot1 = tracemalloc.take_snapshot()
    
        response = test_parsing("assets.xml")
        del response
    
        gc.collect();
        snapshot2 = tracemalloc.take_snapshot()
    
        top_stats = snapshot2.compare_to(snapshot1, 'lineno')
        print("[ Non Zero differences ]")
        for stat in top_stats:
            if (stat.size_diff != 0):
                print(stat)
    

    In our case the Non Zero differences list above was empty after each iteration, proving there was no leak.