Corrupt heap after allocating vector on esp32

I am trying to compute optical flow (lucas kanade - based) on an esp32-cam. I tried to save memory by operating on 2 small buffer of array only. I still have an error corrupt heap:

test0

bfore allocate out conv

after allocate out conv

bfore allocate out conv

after allocate out conv

bfore allocate out conv

after allocate out conv

bfore allocate out conv

CORRUPT HEAP: multi_heap.c:432 detected at 0x3fff7114 abort() was called at PC 0x40090a7f on core 0

Here is my code composed of 1D convolution and transpose to perform separate equivalent 2D convolution:

    template<typename T>
    void
    conv(uint8_t *in, const std::vector<T> &g, const int nf) {
        //int const nf = f.size();
        int const ng = g.size();
        int const n  = nf + ng - 1;
        uint8_t *f = in;
        Serial.println("bfore allocate out conv");
        std::vector<T> out(n, T()); // memory leak CORRUPT HEAP
        Serial.println("after allocate out conv");  
        for(auto i(0); i < n; ++i) {
            int const jmn = (i >= ng - 1)? i - (ng - 1) : 0;
            int const jmx = (i <  nf - 1)? i            : nf - 1;
            for(auto j(jmn); j <= jmx; ++j) {
                out[i] += (f[j] * g[i - j]);
            }
        }
        out.erase(out.begin(), out.begin() + ng / 2 + 1);

        // Rescale to 0..255
        auto max = *std::max_element(out.begin(), out.end());
        auto min = *std::min_element(out.begin(), out.end());
        float x;
        for(auto v : out) {
            x = (v - min) * 255.0 / max;
            *(f++) = (uint8_t)x;
        }
        std::vector<T>().swap(out);
    }

    void transpose(uint8_t *f, int w, int h) {
        for(auto i(0); i < h; ++i) 
            for(auto j(0); j < w; ++j) 
                std::swap(f[w * i + j], f[w * j + i]);
    }

    void LK_optical_flow(uint8_t *src1, uint8_t *src2, uint8_t *output, int w, int h)
    {

        Serial.println("test0");

        std::vector<float> Kernel_Dy = {1, 2, 1};
        std::vector<float> Kernel_Dx = {-1, 0, 1};
        std::vector<float> Kernel_Dt = {1/3.0, 1/3.0, 1/3.0};

        uint8_t *fx = src1;
        uint8_t *fy = new uint8_t[w * h];
        uint8_t *ft = src2;

        memcpy(fy, fx, w * h * sizeof(uint8_t));

        // Sobel Dx
        conv(fx, Kernel_Dx, w*h);
        transpose(fx, w, h);
        conv(fx, Kernel_Dy, w*h);
        transpose(fx, w, h);    
        // Sobel Dy
        conv(fy, Kernel_Dy, w*h);
        transpose(fy, w, h);
        conv(fy, Kernel_Dx, w*h);  // memory leak
        transpose(fy, w, h);    
        // Dt
        //conv(src2, Kernel_Dt, w*h);
    ...
    }

Apparently the leaks come from the second buffer I allocated pointed by fy during the second call of conv(fy, ...) when it allocate out as vector. What am I doing wrong?

Solution

With w and h not being the same, transpose will access and write to out-of-bounds memory.

From your comment, you have w at 96 and h at about 48. The second parameter to swap in transpose will access up to f[w * (w - 1) + h * (h - 1)] which is past the w * h elements you've allocated. This will change memory that hasn't been allocated, and in your case is corrupting the data your library uses to keep track of allocated memory (which is only detected during an allocation of free, and may not get detected right away).

The solution involves rewriting transpose to properly transpose a rectangular matrix. (This involves swapping w and h for the returned matrix.)