c++boost shared-memory allocator boost-interprocess

Why does an object allocated in boost interprocess shared memory take up more memory than required?

For the below program using Boost interprocess shared memory,

#include <iostream>
#include <boost/interprocess/mapped_region.hpp>
#include <boost/interprocess/managed_shared_memory.hpp>
#include <boost/interprocess/containers/list.hpp>
#include <iostream>

#define SHARED_MEMORY_NAME "SO12439099-MySharedMemory"
#define DATAOUTPUT "OutputFromObject"
#define INITIAL_MEM 650000
#define STATE_MATRIX_SIZE 4

using namespace std;
namespace bip = boost::interprocess;


class SharedObject
{
public:
    unsigned int tNumber;
    bool pRcvdFlag;
    bool sRcvdFlag;
    unsigned long lTimeStamp; 
};

typedef bip::allocator<SharedObject, bip::managed_shared_memory::segment_manager> ShmemAllocator;
typedef bip::list<SharedObject, ShmemAllocator> SharedMemData; 


int main()
{
        bip::managed_shared_memory* seg;
        SharedMemData *sharedMemOutputList;

        bip::shared_memory_object::remove(DATAOUTPUT);
        seg = new bip::managed_shared_memory(bip::create_only, DATAOUTPUT, INITIAL_MEM);
        const ShmemAllocator alloc_inst(seg->get_segment_manager());
        sharedMemOutputList = seg->construct<SharedMemData>("TrackOutput")(alloc_inst);

        std::size_t beforeAllocation = seg->get_free_memory();
        std::cout<<"\nBefore allocation = "<< beforeAllocation <<"\n";
        SharedObject temp;
        sharedMemOutputList->push_back(temp);
        std::size_t afterAllocation = seg->get_free_memory();
        std::cout<<"After allocation = "<< afterAllocation <<"\n";        
        std::cout<<"Difference = "<< beforeAllocation - afterAllocation <<"\n";        
        std::cout<<"Size of SharedObject = "<< sizeof(SharedObject) <<"\n";   
        std::cout<<"Size of SharedObject's temp instance = "<< sizeof(temp) <<"\n";           
        seg->destroy<SharedMemData>("TrackOutput");
        delete seg;            
}//main

The output is:

Before allocation = 649680
After allocation = 649632
Difference = 48
Size of SharedObject = 16
Size of SharedObject's temp instance = 16

If the size of SharedObject and it's instance is 16 bytes, then how can the difference in allocation be 48? Even if padding had automatically been done, it's still too much to account for 3 times the size (for larger structures it goes to 1.33 times the size).
Because of this, I'm unable to allocate and dynamically grow the shared memory reliably. If SharedObject contains a list which grows dynamically, that could add to the uncertainty of space allocation even more.

How can these situations be safely handled?

ps: to run the program, you have to link the pthread library and also librt.so.

Update:

This is the memory usage pattern I got when I tabulated values for multiple runs (the memory increase column is basically the current row of the memory used column minus the previous row of the memory used column):

╔═════════════╦════════════════╦═════════════════╗
║      memory used       ║ structure size                     ║ memory increase                 ║
╠═════════════╬════════════════╬═════════════════╣
║                               48 ║                                         1 ║                                              ║
║                               48 ║                                         4 ║                                           0 ║
║                               48 ║                                         8 ║                                           0 ║
║                               48 ║                                       16 ║                                           0 ║
║                               64 ║                                       32 ║                                         16 ║
║                               64 ║                                       40 ║                                           0 ║
║                               80 ║                                       48 ║                                         16 ║
║                               96 ║                                       64 ║                                         32 ║
║                             160 ║                                     128 ║                                         64 ║
║                             288 ║                                     256 ║                                       128 ║
║                             416 ║                                     384 ║                                       128 ║
║                             544 ║                                     512 ║                                       128 ║
║                             800 ║                                     768 ║                                       256 ║
║                           1056 ║                                   1024 ║                                       256 ║
╚═════════════╩════════════════╩═════════════════╝

IMPORTANT: The above table applies only to shared memory list. For vector, the (memory used, structure size) values are = (48, 1), (48, 8), (48, 16), (48, 32), (80, 64), (80, 72), (112, 96), (128, 120), (176, 168), (272, 264), (544, 528).
So a different memory calculation formula is needed for other containers.

Solution

Remember that any general purpose allocation mechanism has a payload in order to store information about how to deallocate that memory, how to merge that buffer with adjacent buffers, etc.. This happens with your system malloc (typically, 8-16 extra bytes per allocation plus extra alignment). The memory allocator in shared memory has an overhead of 4-8 bytes (in 32 bits systems, 8-16 in 64 bit systems)

Then the library needs to store the number of objects in order to call destructors when calling "destroy_ptr(ptr)" (you can allocate arrays, so you need to know how many destructors shall be called). And you've made a named allocation, so the library needs to store that string in shared memory and some metadata to find it (a pointer to the string and maybe that this was a "named allocation" and not an "anonymous" or "instance allocation).

So 16 bytes of data + 8 bytes from the memory allocator + 8 bytes to store pointer+metadata to the name + 12 bytes from the string "TrackOutput" (including null-end) plus alignment to 8 bytes, you get 48 bytes.

The overhead is nearly constant for each allocation. So the constant factor 1.33 only applies to small allocations. If you allocate a single byte you'll get much worse factors, just like if you allocate a single char from the heap.

The library throws an exception if there is not available memory to store a new object, you can catch it and try to create a new managed shared memory. Note that memory gets fragmented with allocations and reallocations, so even with free bytes in shared memory, the managed shared memory could not service your request because there is no contiguous memory big enough to fulfill it. A shared memory can't be automatically expanded so fragmentation is a much bigger issue than with heap memory.

You can't dynamically grow shared memory as other process might be connected to it and would crash trying to access unmapped pages. The only alternative is to preallocate a shared memory that will be enough adding a constant padding factor or to allocate a new managed shared memory and notify all other readers/writers (maybe using a prealocated structure in the original shared memory) that new elements will go in a new managed shared memory.