Search code examples
c++ciolow-level

Reading file with fread() in reverse order causes memory leak?


I have a program that basically does this:

  1. Opens some binary file
  2. Reads the file backwards (by backwards, I mean it starts near EOF, and ends reading at beginning of file, i.e. reads the file "right-to-left"), using 4MB chunks
  3. Closes the file

My question is: why memory consumption looks like below, even though there are no obvious memory leaks in my attached code?

Memory consumption during program execution

Here's the source of program that was run to obtain above image:

#include <stdio.h>
#include <string.h>

int main(void)
{
    //allocate stuff
    const int bufferSize = 4*1024*1024;
    FILE *fileHandle = fopen("./input.txt", "rb");
    if (!fileHandle)
    {
        fprintf(stderr, "No file for you\n");
        return 1;
    }
    unsigned char *buffer = new unsigned char[bufferSize];
    if (!buffer)
    {
        fprintf(stderr, "No buffer for you\n");
        return 1;
    }

    //get file size. file can be BIG, hence the fseeko() and ftello()
    //instead of fseek() and ftell().
    fseeko(fileHandle, 0, SEEK_END);
    off_t totalSize = ftello(fileHandle);
    fseeko(fileHandle, 0, SEEK_SET);

    //read the file... in reverse order. This is important.
    for (off_t pos = totalSize - bufferSize, j = 0;
        pos >= 0;
        pos -= bufferSize, j ++)
    {
        if (j % 10 == 0)
        {
            fprintf(stderr,
                "reading like crazy: %lld / %lld\n",
                pos, totalSize);
        }

        /*
         * below is the heart of the problem. see notes below
         */
        //seek to desired position
        fseeko(fileHandle, pos, SEEK_SET);
        //read the chunk
        fread(buffer, sizeof(unsigned char), bufferSize, fileHandle);
    }

    fclose(fileHandle);
    delete []buffer;
}

I have also following observations:

  1. Even though RAM usage jumps by 1GB, the whole program uses only 5MB thorough whole execution.
  2. Commenting call to fread() out makes memory leak go away. This is weird, since I don't allocate anything anywhere near it, that could trigger memory leak...
  3. Also, reading the file normally instead of backwards (= commenting call to fseeko() out), makes memory leak go away as well. This is the ultra-weird part.

Further information...

  1. Following doesn't help:
    1. Checking results of fread() - yields nothing out of ordinary.
    2. Switching to normal, 32-bit fseek and ftell.
    3. Doing stuff like setbuf(fileHandle, NULL).
    4. Doing stuff like setvbuf(fileHandle, NULL, _IONBF, *any integer*).
  2. Compiled with g++ 4.5.3 on Windows 7 via cygwin and mingw; without any optimalizations, just g++ test.cpp -o test. Both present such behaviour.
  3. The file used in tests was 4GB long, full of zeros.
  4. The weird pause in the middle of the chart could be explained with some kind of temporary I/O hangup, unrelated to this question.
  5. Finally, if I wrap reading in infinite loop... the memory usage stops increasing after first iteration.

I think it has to do with some kind of internal cache building up till it's filled with whole file. How does it really work behind the scenes? How can I prevent that in a portable way??


Solution

  • I think, this is more an OS issue (or even an OS resource use reporting issue) than an issue with your program. Of course, it only uses 5 MB of memory: 1 MB for itself (libs, stack etc.) and 4 MB for the buffer. Whenever you do a fread(), the OS seems to "bind" part of the file to your process, and seems to release it not at the same speed. As memory use on your machine is low, this is not a problem: The OS just keeps the already read data "hanging around" longer than necessary, probably assuming, that your application might read it again, soon, and then it doesn't have to do that binding again.

    If memory pressure was higher, than the OS is very likely to unbind the memory faster, so that jump on your memory usage history would be smaller.