Search code examples
c++linuxboostmemory-mapped-filesboost-iostreams

Memory usage when using boost::iostreams::mapped_file


I am pasting some code here which uses boost iostream to mmap & then writes to the mapped file:

typedef unordered_map<int, string> work;

    int main()
    {

            work d;
            d[0] = "a";

            boost::iostreams::mapped_file_params  params;
            params.path = "map.dat";
            params.new_file_size = 1000000000;
            params.mode = (std::ios_base::out | std::ios_base::in);
            boost::iostreams::mapped_file  mf;
            mf.open(params);
            work* w = static_cast<work*>((void*)mf.data());
            w[0] = d;
            for(int i=1; i <1000000000 ;++i)
            {
                    w->insert(std::make_pair(i, "abcdef"));
            }
            mf.close();

    }

When i executed this on my centos 6 box with 8 processors and 16GB RAM, i observed the below:

  1. When the data was being inserted into the memory mapped file, RES (from top command) was increasing continuously and it reached till 14GB. I was under the impression that when i mmap a file VIRT will increase and not RES. So is it that when we write to the mmap file, first its written to the memory and then commited to the disk? Or is there any intermediate buffer/cache used?

  2. With the help of "free" command , i also observed that after the memory usage reaches 16GB, buffers are used. Here are some snapshots of free command at different times when the above code was executing:

                total       used       free     shared    buffers     cached
    Mem:      16334688   10530380    5804308          0     232576    9205532
    -/+ buffers/cache:    1092272   15242416
    Swap:     18579448     348020   18231428
    
                total       used       free     shared    buffers     cached
    Mem:      16334688   13594208    2740480          0     232608    9205800
    -/+ buffers/cache:    4155800   12178888
    Swap:     18579448     348020   18231428
    
                total       used       free     shared    buffers     cached
    Mem:      16334688   15385944     948744          0     232648    9205808
    -/+ buffers/cache:    5947488   10387200
    Swap:     18579448     348020   18231428
    
                total       used       free     shared    buffers     cached
    Mem:      16334688   16160368     174320          0     204940    4049224
    -/+ buffers/cache:   11906204    4428484
    Swap:     18579448     338092   18241356
    
                total       used       free     shared    buffers     cached
    Mem:      16334688   16155160     179528          0     141584    2397820
    -/+ buffers/cache:   13615756    2718932
    Swap:     18579448     338092   18241356
    
                total       used       free     shared    buffers     cached
    Mem:      16334688   16195960     138728          0       5440      17556
    -/+ buffers/cache:   16172964     161724
    Swap:     18579448     572052   18007396
    

    What does this behavior signify?

  3. It took a lot of time to write data to memory mapped file compared to writing into memory. What is the reason for this?

    I wanted to use memory mapping to bring down the RES usage as i deal with huge data. But it does not seem to work that way. Wanted to keep all the data in memory mapped files and read them back when required.

    Am I using memory mapping incorrectly? Or that's the way it behaves?


Solution

    1. VIRT will increase immediately (all pages are mapped into the process address space). RES will increase whenever pages are used, which causes them to be paged into the physical memory.

      This happens for as long as there is sufficient memory available, after which the OS starts purging LRU pages from the reserved sets (unless they were VirtualLock/mlock-ed or are otherwise unmovable (like kernel pages, DMA buffers, security sensitive data etc.).

      So, the OS optimistically leaves the pages reserved as long as possible (which improves performance as long as no other processes contend for the memory).

    2. This signifies that the OS is doing it's job.

    3. You're writing to disk. Disk access is (a lot) slower than memory access. How often the data actually gets written out to disk depends on tuning. This answer lists some of the tuning parameters that are available on linux (which you seem to be using):