Search code examples
c++memory-leaksneural-networkfann

Memory leak training ANN using multiple files


My loop:

for (int i = 1; i <= epochs; ++i) {
    for (std::vector<std::filesystem::path>::iterator it = batchFiles.begin(); it != batchFiles.end(); ++it) {
        struct fann_train_data *data = fann_read_train_from_file(it->string().c_str());
        fann_shuffle_train_data(data);
        float error = fann_train_epoch(ann, data);
    }
}

ann is the network, batchFiles an std::vector<std::filesystem::path>. This iterates training data files in a folder to train the ANN as many times as determined by epochs. This causes a memory leak:

struct fann_train_data *data = fann_read_train_from_file(it->string().c_str());

I must switch between files as I don't have enough memory. Why does a memory leak happen and how can I resolve this?


Solution

  • In C++, memory is automatically freed when the object managing it goes out of scope. (Assuming the class was properly written.) That's called RAII.

    But FANN presents a C API, not a C++ API. In C, you need to manually free memory when you're done with it. By extension, when a C library creates an object for you, it typically needs you to tell it when you're done with the object. The library doesn't have a good way to figure out on its own when the object's resources should be freed.

    The convention is that whenever a C API gives you a function like struct foo* create_foo(), you should be looking for a corresponding function like void free_foo(struct foo* f). It's symmetrical.

    In your case, as originally noted by PaulMcKenzie, you need void fann_destroy_train_data(struct fann_train_data * train_data). From the documentation, emphasis mine:

    Destructs the training data and properly deallocates all of the associated data. Be sure to call this function after finished using the training data.