I am working on a Kedro 0.17.2 project that is running on out-of-memory issues and I'm trying to reduce the memory footprint.
I'm doing the profiling by using mprof
from the memory-profiler
library and I noticed that there is always a child process and memory seems to duplicate in the main process after the first computation in the node that is running. Is it possible that Kedro is duplicating the dataframes in memory? And, if so, is there a way to avoid this?
Notes:
SequentialRunner
is_async
cli optionIt turns out this issue is caused by a possible bug in the memory-profiler
library that is used in the kedro.extras.decorators.memory_profiler.mem_profile
decorator.
The kedro decorator makes use of the memory_usage
function in the memory-profiler
module. It is used to sample the total memory being used by the running function from within the python process.
There is an open issue about this problem but with no solution yet. https://github.com/pythonprofilers/memory_profiler/issues/332
For the moment I have just removed the decorator.