Search code examples
linuxout-of-memorymmap

How to mmap() a large file without risking the OOM killer?


I've got an embedded ARM Linux box with a limited amount of RAM (512MB) and no swap space, on which I need to create and then manipulate a fairly large file (~200MB). Loading the entire file into RAM, modifying the contents in-RAM, and then writing it back out again would sometimes invoke the OOM-killer, which I want to avoid.

My idea to get around this was to use mmap() to map this file into my process's virtual address space; that way, reads and writes to the mapped memory-area would go out to the local flash-filesystem instead, and the OOM-killer would be avoided since if memory got low, Linux could just flush some of the mmap()'d memory pages back to disk to free up some RAM. (That might make my program slow, but slow is okay for this use-case)

However, even with the mmap() call, I'm still occasionally seeing processes get killed by the OOM-killer while performing the above operation.

My question is, was I too optimistic about how Linux would behave in the presence of both a large mmap() and limited RAM? (i.e. does mmap()-ing a 200MB file and then reading/writing to the mmap()'d memory still require 200MB of available RAM to accomplish reliably?) Or should mmap() be clever enough to page out mmap'd pages when memory is low, but I'm doing something wrong in how I use it?

FWIW my code to do the mapping is here:

void FixedSizeDataBuffer :: TryMapToFile(const std::string & filePath, bool createIfNotPresent, bool autoDelete)
{
   const int fd = open(filePath.c_str(), (createIfNotPresent?(O_CREAT|O_EXCL|O_RDWR):O_RDONLY)|O_CLOEXEC, S_IRUSR|(createIfNotPresent?S_IWUSR:0));
   if (fd >= 0)
   {
      if ((autoDelete == false)||(unlink(filePath.c_str()) == 0))  // so the file will automatically go away when we're done with it, even if we crash
      {
         const int fallocRet = createIfNotPresent ? posix_fallocate(fd, 0, _numBytes) : 0;
         if (fallocRet == 0)
         {
            void * mappedArea = mmap(NULL, _numBytes, PROT_READ|(createIfNotPresent?PROT_WRITE:0), MAP_SHARED, fd, 0);
            if (mappedArea)
            {
               printf("FixedSizeDataBuffer %p: Using backing-store file [%s] for %zu bytes of data\n", this, filePath.c_str(), _numBytes);
               _buffer         = (uint8_t *) mappedArea;
               _isMappedToFile = true;
            }
            else printf("FixedSizeDataBuffer %p: Unable to mmap backing-store file [%s] to %zu bytes (%s)\n", this, filePath.c_str(), _numBytes, strerror(errno));
         }
         else printf("FixedSizeDataBuffer %p: Unable to pad backing-store file [%s] out to %zu bytes (%s)\n", this, filePath.c_str(), _numBytes, strerror(fallocRet));
      }
      else printf("FixedSizeDataBuffer %p: Unable to unlink backing-store file [%s] (%s)\n", this, filePath.c_str(), strerror(errno));

      close(fd); // no need to hold this anymore AFAIK, the memory-mapping itself will keep the backing store around
   }
   else printf("FixedSizeDataBuffer %p: Unable to create backing-store file [%s] (%s)\n", this, filePath.c_str(), strerror(errno));
}

I can rewrite this code to just use plain-old-file-I/O if I have to, but it would be nice if mmap() could do the job (or if not, I'd at least like to understand why not).


Solution

  • After much further experimentation, I determined that the OOM-killer was visiting me not because the system had run out of RAM, but because RAM would occasionally become sufficiently fragmented that the kernel couldn't find a set of physically-contiguous RAM pages large enough to meet its immediate needs. When this happened, the kernel would invoke the OOM-killer to free up some RAM to avoid a kernel panic, which is all well and good for the kernel but not so great when it kills a process that the user was relying on to get his work done. :/

    After trying and failing to find a way to convince Linux not to do that (I think enabling a swap partition would avoid the OOM-killer, but doing that is not an option for me on these particular machines), I came up with a hack work-around; I added some code to my program that periodically checks the amount of memory fragmentation reported by the Linux kernel, and if the memory fragmentation starts looking too severe, preemptively orders a memory-defragmentation to occur, so that the OOM-killer will (hopefully) not become necessary. If the memory-defragmentation pass doesn't appear to be improving matters any, then after 20 consecutive attempts, we also drop the VM Page cache as a way to free up contiguous physical RAM. This is all very ugly, but not as ugly as getting a phone call at 3AM from a user who wants to know why their server program just crashed. :/

    The gist of the work-around implementation is below; note that DefragTick(Milliseconds) is expected to be called periodically (preferably once per second).

     // Returns how safe we are from the fragmentation-based-OOM-killer visits.
     // Returns -1 if we can't read the data for some reason.
     static int GetFragmentationSafetyLevel()
     {
        int ret = -1;
        FILE * fpIn = fopen("/sys/kernel/debug/extfrag/extfrag_index", "r");
        if (fpIn)
        {
           char buf[512];
           while(fgets(buf, sizeof(buf), fpIn))
           {  
              const char * dma = (strncmp(buf, "Node 0, zone", 12) == 0) ? strstr(buf+12, "DMA") : NULL;
              if (dma)
              {  
                 // dma= e.g.:  "DMA -1.000 -1.000 -1.000 -1.000 0.852 0.926 0.963 0.982 0.991 0.996 0.998 0.999 1.000 1.000"
                 const char * s = dma+4;  // skip past "DMA ";
                 ret = 0; // ret now becomes a count of "safe values in a row"; a safe value is any number less than 0.500, per me
                 while((s)&&((*s == '-')||(*s == '.')||(isdigit(*s))))
                 {  
                    const float fVal = atof(s);
                    if (fVal < 0.500f)
                    {  
                       ret++;
                       
                       // Advance (s) to the next number in the list
                       const char * space = strchr(s, ' ');   // to the next space
                       s = space ? (space+1) : NULL;
                    }
                    else break;  // oops, a dangerous value!  Run away!
                 }
              }
           }
           fclose(fpIn);
        }
        return ret;
     }
    
     // should be called periodically (e.g. once per second)
     void DefragTick(Milliseconds current_time_in_milliseconds)
     {
         if ((current_time_in_milliseconds-m_last_fragmentation_check_time) >= Milliseconds(1000))
         {
            m_last_fragmentation_check_time = current_time_in_milliseconds;
    
            const int fragmentationSafetyLevel = GetFragmentationSafetyLevel();
            if (fragmentationSafetyLevel < 9)
            {
               m_defrag_pending = true;  // trouble seems to start at level 8
               m_fragged_count++;        // note that we still seem fragmented
            }
            else m_fragged_count = 0;    // we're in the clear!
    
            if ((m_defrag_pending)&&((current_time_in_milliseconds-m_last_defrag_time) >= Milliseconds(5000)))
            {
               if (m_fragged_count >= 20)
               {
                  // FogBugz #17882
                  FILE * fpOut = fopen("/proc/sys/vm/drop_caches", "w");
                  if (fpOut)
                  {
                     const char * warningText = "Persistent Memory fragmentation detected -- dropping filesystem PageCache to improve defragmentation.";
                     printf("%s (fragged count is %i)\n", warningText, m_fragged_count);
                     fprintf(fpOut, "3");
                     fclose(fpOut);
    
                     m_fragged_count = 0;
                  }
                  else
                  {
                     const char * errorText = "Couldn't open /proc/sys/vm/drop_caches to drop filesystem PageCache!";
                     printf("%s\n", errorText);
                  }
               }
    
               FILE * fpOut = fopen("/proc/sys/vm/compact_memory", "w");
               if (fpOut)
               {
                  const char * warningText = "Memory fragmentation detected -- ordering a defragmentation to avoid the OOM-killer.";
                  printf("%s (fragged count is %i)\n", warningText, m_fragged_count);
                  fprintf(fpOut, "1");
                  fclose(fpOut);
    
                  m_defrag_pending   = false;
                  m_last_defrag_time = current_time_in_milliseconds;
               }
               else
               {
                  const char * errorText = "Couldn't open /proc/sys/vm/compact_memory to trigger a memory-defragmentation!";
                  printf("%s\n", errorText);
               }
            }
         }
     }