Search code examples
c++memory-mapped-filesboost-interprocessboost-geometryr-tree

Estimating size required for memory mapped boost rtree


We have a scenario where we dynamically grow a memory mapped file used for boost's r-tree geometric index. We also make use of boost's interprocess memory mapped file api's.

Mechanics are already sorted out in terms of unmapping the file, growing, and remapping - this all works.

Thus far we've tried overestimating the size with a factor 10 of the inherent size of our coordinates, which works but is grossly over-estimated when inspecting with du.

Is there some way to predict (worst-case or precise) what size we should require the mapped file to grow given the number of objects? Underestimating, for example with a factor 5, causes stack smashing eventually...

Thanks


Solution

  • In the pure sense, the question is largely unrelated to boost-interprocess.

    You want to know allocation patterns (not just net allocation volume, but also the effective "pool" use due to fragmentation.

    What you could do is statistics on allocator use (good question: is there some kind of statistics-gathering allocator adaptor?) and work it out.

    Of course you'll be in approximation territory, but given enough simulation runs you should have usable information.

    As a last resort the source/developers of Boost Geometry would be the people to ask.

    Other angles

    With regards to Boost Interprocess, we don't know what you're using. I will assume managed_mapped_file/managed_heap_memory/managed_external_buffer. Note that when using these with the default memory allocation strategy (rbtree_best_fit) there can be considerable fragmentation or just allocation overhead (for node-based containers, which likely includes rtree).

    Relevant example:

    enter image description here

    This immediately gives you some ideas to use:

    • shm.get_free_memory() to get a raw number for space remaining in the segment
    • using a memory profiler (like Valgrind Massif) when using the rtree outside of a managed memory segment

    Out Of The Box:

    • Lastly, don't forget that it can be very cheap to allocate a sparse file, even of say 10 GiB, and just allocate into that using a shared memory manager: only the pages actually in use will be committed to disk, so the actual disk usage will closely match the actual required size.

      Sparse files are magic.