Search code examples
pythonpandasout-of-memorykedro

Kedro - Memory management


I am working on a Kedro 0.17.2 project that is running on out-of-memory issues and I'm trying to reduce the memory footprint.

I'm doing the profiling by using mprof from the memory-profiler library and I noticed that there is always a child process and memory seems to duplicate in the main process after the first computation in the node that is running. Is it possible that Kedro is duplicating the dataframes in memory? And, if so, is there a way to avoid this?

Notes:

  • I'm using the SequentialRunner
  • I'm not using the is_async cli option
  • I'm not using either multithreading or multiprocessing in the node execution

enter image description here


Solution

  • It turns out this issue is caused by a possible bug in the memory-profiler library that is used in the kedro.extras.decorators.memory_profiler.mem_profile decorator.

    The kedro decorator makes use of the memory_usage function in the memory-profiler module. It is used to sample the total memory being used by the running function from within the python process.

    There is an open issue about this problem but with no solution yet. https://github.com/pythonprofilers/memory_profiler/issues/332

    For the moment I have just removed the decorator.