Search code examples
c++c++11boostshared-memoryboost-interprocess

move boost::interprocess::string in shared memory


I wanted to implement, some queue of messages (based on vector) to handling in some way data from the network and to do this I used shared memory to save messages and I encountered an issue related to it, the thing is that my code works well when I run it first time, when I want to run it once again I get segfaut when I want to assign a new value to string in my queue in shared memory, actually in my case when I want to move it (the same problem exists when I want to copy it). The problem doesn't exist when SSO is working, so when I have small string enough. What did I wrong ?

#include <atomic>
#include <exception>
#include <iostream>
#include <memory>
#include <string>
#include <vector>

#include <boost/interprocess/allocators/allocator.hpp>
#include <boost/interprocess/containers/string.hpp>
#include <boost/interprocess/containers/vector.hpp>
#include <boost/interprocess/managed_shared_memory.hpp>

namespace bip = boost::interprocess;

struct BadSharedMemoryAccess final : public std::exception
{
    BadSharedMemoryAccess(std::string&& msg):
        msg_{std::move(msg)}
{}

virtual const char* what() const noexcept
{
    return msg_.c_str();
}

private:
    std::string msg_;
};

struct Message
{
    bip::string message_;
};

template<typename Alloc>
class MyCustomData final
{
public:
    using allocator_type = typename Alloc::template rebind<Message>::other;

    MyCustomData(std::size_t number_of_messages, Alloc alloc = {}) :
        init_add_index_{0},
        init_handle_index_{-1},
        messages_{number_of_messages, alloc}
    {}

public:
    uint_fast64_t init_add_index_;
    int_fast64_t init_handle_index_;
    std::vector<Message, Alloc> messages_;
//    bip::vector<data::Message, Alloc> messages_;
};

template<typename DataType, typename DataAllocator>
class SharedMemory
{
public:
    template<typename... Args>
    SharedMemory(std::string const& shm_segment_name, std::size_t const segment_size,
        std::string const& shm_object_name, Args&&... args) :
            shm_object_name_{shm_object_name}
    {
        std::cout << "attempt to allocate space for shared memory segment " << shm_segment_name
              << ", size: ." << segment_size << std::endl;
        setSharedMemorySize(shm_segment_name, segment_size);

        DataAllocator const allocInstance{shm_.get_segment_manager()};
        data_ = shm_.find_or_construct<DataType>(shm_object_name.c_str())(std::forward<Args>(args)..., allocInstance);
        if (data_)
            std::cout << "shared memory segment has been allocated" << std::endl;
        else
            std::cout << "shared memory has not been constructed or founded" << std::endl;
    }

    virtual ~SharedMemory()
    {
        std::cout << "shared memory segment will be closed." << std::endl;
    }

    void setSharedMemorySize(std::string const& shm_segment_name, std::size_t const segment_size)
    {
        auto page_size = bip::mapped_region::get_page_size();
        auto const page_increase_rate{2};
        while (page_size < segment_size)
        {
            page_size *= page_increase_rate;
        }

        std::cout <<"seting page size: " << page_size << std::endl;
        shm_ = bip::managed_shared_memory{bip::open_or_create, shm_segment_name.c_str(), page_size};
        std::cout << "space for shared memory has been successfully allocated." << std::endl;
    }

    DataType& getData()
    {
        if (not data_)
            throw BadSharedMemoryAccess{"cannot access " + shm_object_name_};
        return *data_;
    }

protected:
    DataType* data_;

private:
    std::string const shm_object_name_;
    bip::managed_shared_memory shm_;
};

namespace sharable
{
    using DataAllocator = bip::allocator<Message, bip::managed_shared_memory::segment_manager>;
    template<typename Alloc>
    using DataType = MyCustomData<Alloc>;
}

int main()
{
    std::size_t const max_number_of_elements_in_container{1000000};
    auto shmem_data = std::make_shared<SharedMemory<MyCustomData<sharable::DataAllocator>, sharable::DataAllocator>>(
        "SHM_SEGMENT", sizeof(MyCustomData<sharable::DataAllocator>) +
            (max_number_of_elements_in_container * sizeof(Message) * 2),
        "SHM_CONTAINER", max_number_of_elements_in_container);

    std::vector<bip::string> feed{max_number_of_elements_in_container};
    for (std::size_t i = 0; i < max_number_of_elements_in_container; ++i)
    {
        std::string s{"blablabla11111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111" + std::to_string(i)};
        feed[i] = s.c_str();
    }

    auto& data = shmem_data->getData();
    auto& shmem_vec = data.messages_;
    std::cout << "addr: " << shmem_vec.data() << std::endl;
    for (std::size_t i = 0; i < max_number_of_elements_in_container; ++i)
    {
//        if (i == 0)
//            std::cout << "msg: " << shmem_vec[i].message_ << std::endl;
        auto msg = feed[i];
        shmem_vec[i].message_ = std::move(msg);
    }
    return 0;
}

Solution

    1. You are not using a shared-memory allocator for the strings. In that sense your question is the same as circular_buffer and managed_mapped_file segmentation fault. You might want to read that for a general intro.

    2. Your example complicates things by wrapping the strings into your own structs. That means you get a lot of tedious work passing around allocators. For the "uses_allocator" approach which - in combination with scoped_allocator_adaptor - can alleviate some of that pain, see e.g. making non-shared copies of boost::interprocess shared memory objects.

    3. Reading the rest of your code, I'm a bit confused. Why would you template your SharedMemory type with an allocator? I mean, the SharedMemory should be the single point responsible for chosing and passing the right allocator, right? How could it work with an externally provided allocator.

    4. There are typedefs that are unused, you make a new segment for each object even though it might be from the same shared memory (mapping the same pages into memory multiple times). Yet you somehow think it's important to share ownership of one such an instance (make_shared).

    5. The size calculations are just wrong: they only take into account the size of your Message struct, not the allocated string data. You seem to have forgotten that mapped memory is virtual memory too. The underlying storage will be able to allocate sparsely. So, why not reserve a generous amount of memory, and just respond when you run out?

    6. You're talking about and coding (some) move semantics, but then you write:

      for (std::size_t i = 0; i < max_number_of_elements_in_container; ++i) {
          auto msg = feed[i];
          shmem_vec[i].message_ = std::move(msg);
      }
      

      That's confused. What good is the move (if it worked, see below) if you first make an explicit copy anyways:

          auto msg = feed[i];
      
    7. These are worrying signs:

      uint_fast64_t init_add_index_;
      int_fast64_t  init_handle_index_;
      

      It looks as if you may be planning to use these concurrently from multiple processes/threads². In such case, you must add synchronization OR use atomic<> types at the very least.

    Summarizing it looks to me that you may be trying so hard to hide complexity that you accidentally increased it.

    On Moving

    You ask about "moving shared string in shared memory". For this part of the question, let's assume you actually had your strings allocating in shared memory.

    Looking at how moving strings works it's not hard to see that moving strings inside shared memory will work exactly like moving them inside the heap would work: the object address will be different, but the internal pointer to allocated memory will be the same.

    However, the code does something else: It doesn't move inside shared memory. It attempts to move from heap to shared memory. This will obviously not be safe since the objects in shared memory cannot usefully point to anything outside the shared memory segment (any other process would invoke undefined behaviour indirecting through such a pointer).

    As often, in C++ you're partly your own to prevent accidents like this: C++11 basic_string<>::swap specifies

    The behavior is undefined if Allocator does not propagate on swap and the allocators of *this and other are unequal.

    The move-constructor is specified to have complexity:

    constant. If alloc is given and alloc != other.get_allocator(), then linear

    Note that the semantics of allocators when copying/moving containers (basic_string<> is a container, similar to std::vector<>) is even more involved:

    enter image description here

    What To Do?

    All in all, if you're lucky the move won't compile because the allocators are of incompatible types and none is supplied (e.g. by the uses_allocator protocol).

    If you're less lucky, it will compile but it will (fortunately) not perform the move because it detects that the allocators are "not equal" and hence it falls back to copying the storage.

    If you're absolutely unlucky, you chose a configuration where the types are compatible and the allocators are not configured to safely propagate on container move/copy, or another circumstance leads the allocators to fail to detect "incompatibility"¹, and you end up with UB.

    In this case there's a much easier option: you know you can't move. Hence, don't request a move.

    Risk averted.

    Some Code To Heal Our Wounds

    After breaking down a lot of the complexity in the code and question, let's get constructive and show what we can do to fix things:

    #include <exception>
    #include <iomanip>
    #include <iostream>
    #include <random>
    
    #include <boost/interprocess/allocators/allocator.hpp>
    #include <boost/interprocess/containers/string.hpp>
    #include <boost/interprocess/containers/vector.hpp>
    #include <boost/interprocess/managed_shared_memory.hpp>
    
    namespace bip = boost::interprocess;
    
    struct BadSharedMemoryAccess final : std::runtime_error {
        BadSharedMemoryAccess(std::string msg) : std::runtime_error{ std::move(msg) } {}
    };
    

    That's the prelude. Now, let's state our intentions:

    using Segment = bip::managed_shared_memory;
    template <typename U> using Alloc = bip::allocator<U, Segment::segment_manager>;
    

    This makes it easy to refer to (and maybe switch out) the segment and its allocators.

    using Message       = bip::string;
    using Feed          = bip::vector<Message>;
    using SharedMessage = bip::basic_string<char, std::char_traits<char>, Alloc<char> >;
    using SharedFeed    = bip::vector<SharedMessage, Alloc<SharedMessage> >;
    

    Simply define our domain entities. By using bip::string/bip::vector for heap and shared allocation versions we get the best interop between the two;

    class MyCustomData final {
      public:
        using allocator_type = SharedFeed::allocator_type;
    
        MyCustomData(std::size_t capacity, allocator_type alloc)
            : messages_(capacity, SharedMessage(alloc), alloc) // don't brace initlaize
        { }
    
        auto&       messages()       { return messages_; }
        auto const& messages() const { return messages_; }
    
      private:
        uint_fast64_t init_add_index_ = 0;
        int_fast64_t  init_handle_index_ = -1;
        SharedFeed messages_;
    };
    

    For now, dropped the virtual destructor, and the Message struct that simply wrapped a bip::string for convenience.

    template <typename T> class SharedMemory final {
      public:
        template <typename... Args>
        SharedMemory(std::string const& shm_segment_name,
                     std::size_t const segment_size,
                     std::string const& shm_object_name,
                     Args&&... args)
            : shm_ { bip::open_or_create, shm_segment_name.c_str(), segment_size }
        {
            data_ = shm_.find_or_construct<T>
                (shm_object_name.c_str())
                (std::forward<Args>(args)...,
                 shm_.get_segment_manager())
                ;
    
            if (!data_) throw BadSharedMemoryAccess {"cannot access " + shm_segment_name + "/" + shm_object_name};
        }
    
        T const& get() const { return *data_; }
        T&       get()       { return *data_; }
    
        auto free() const { return shm_.get_free_memory(); }
      protected:
        T* data_;
    
      private:
        Segment shm_;
    };
    

    It strikes me that SharedMemory has too many responsibilities: on the one hand it tries to be a "smart-reference" for shared objects, and on the other hand it "manages a segment". This leads to problems if you actually wanted to have multiple objects in a segment. Consider splitting into Shared::Segment and Shared::Object<T>.

    Feed generate_heap_feed(size_t n) {
        Feed feed;
        feed.reserve(n);
        for (size_t i = 0; i < n ; ++i) {
            feed.emplace_back(
                "blablabla11111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111"
                + std::to_string(i));
        }
        return feed;
    }
    

    Extracted the test-feed generator from main.

    int main() {
        static constexpr std::size_t capacity { 1000000 };
        static constexpr auto estimate = 300ull << 20; // 300 MiB (<< 10 kilo, << 20 mebi, << 30 gibi)
    

    Replaced the misguided calculations³ with a generous estimate. See measurements below.

        using SharedData = SharedMemory<MyCustomData>;
        SharedData shmem_data("SHM_SEGMENT", estimate, "SHM_CONTAINER", capacity);
        std::cout << "Free: " << shmem_data.free() << "\n";
    

    Nice and readable. On my system prints "Free: 282572448" on first run.

        Feed const feed      = generate_heap_feed(capacity);
        SharedFeed& shm_feed = shmem_data.get().messages();
    

    Now we have our feeds side by side, let's copy:

        // copy feed from heap to shm
        auto const n = std::min(feed.size(), shm_feed.size());
        std::copy_n(feed.begin(), n, shm_feed.begin());
    
        std::cout << "Copied: " << n << "\n";
        std::cout << "Free: " << shmem_data.free() << "\n";
    

    That's all. We don't try to move, because we know that can't work. bip::basic_string correctly knows how to copy between incompatible allocators. No sweat.

    For good measure let's print some diagnostic information:

        {
            // check some random samples
            std::default_random_engine prng{std::random_device{}()};
            auto pick = [&] { return std::uniform_int_distribution<>(0, n-1)(prng); };
    
            for (auto index : {pick(), pick(), pick(), pick()}) {
                std::string_view a = feed.at(index);
                std::string_view b = shm_feed.at(index);
                std::cout << "Message #" << index
                    << (a == b? " OK":" FAIL")
                    << " " << std::quoted(b) << std::endl;
            }
        }
    }
    

    See it Live On Coliru⁴

    Prints, e.g.: enter image description here

    Especially note the filesize measurements (--apparent-size vs. the size on disk). This confirms my point about sparse allocation. Even if you reserved 100TB, the effective size of the SHM_CONTAINER would still be 182MiB.

    BONUS SECTIONS

    Scoped Allocator Adaptors

    Simply replacing one line:

    template <typename U> using Alloc = bip::allocator<U, Segment::segment_manager>;
    

    with

    template <typename U> using Alloc = boost::container::scoped_allocator_adaptor<
        bip::allocator<U, Segment::segment_manager> >;
    

    does the trick, unlocking magical allocator propagation, e.g. from vector to string when constructing its elements (with emplace or assign). So we can simplify the copy_n even more from:

    // copy feed from heap to shm
    auto const n = std::min(feed.size(), shm_feed.size());
    std::copy_n(feed.begin(), n, shm_feed.begin());
    
    std::cout << "Copied: " << n << "\n";
    

    to simply:

    shm_feed.assign(feed.begin(), feed.end());
    std::cout << "Copied: " << shm_feed.size() << "\n";
    

    It has exactly the same allocation behaviour as before. See it Live On Coliru as well.

    Polymorphic Allocators (c++17)

    This would not change a thing, fundamentally, except:

    • it would make Feed/SharedFeed and Message/SharedMessage share the same static type
    • it would have the scoped-allocator behaviour as before by default

    However, until we get proper support for fancy pointers in the standard, this is a pipe dream:

    Making Message Struct Again?

    Well. More like "struggle again". I admit that I hate writing allocator-aware data types. This is no doubt not optimal, but it is the minimal thing I could do to make things working:

    template <typename Alloc>
    struct BasicMessage {
        // pre-c++17:
        //  using allocator_type = typename Alloc::template rebind<char>::other;
        using allocator_type = typename std::allocator_traits<Alloc>::template rebind_alloc<char>;
    
        BasicMessage(std::allocator_arg_t, allocator_type alloc)
            : _msg(alloc) { }
    
        template <typename T1, typename... T,
                 typename = std::enable_if_t<
                        not std::is_same_v<std::allocator_arg_t, std::decay_t<T1> >
                     >
            >
        explicit BasicMessage(T1&& a, T&&... init)
            : _msg(std::forward<T1>(a), std::forward<T>(init)...) { }
    
        template <typename OtherAlloc>
        BasicMessage(BasicMessage<OtherAlloc> const& other, allocator_type alloc)
            : _msg(other.message().begin(), other.message().end(), alloc) { }
    
        template <typename OtherAlloc, typename OM = BasicMessage<OtherAlloc> >
        std::enable_if_t<
            not std::is_same_v<allocator_type, typename OM::allocator_type>,
            BasicMessage&>
        operator=(BasicMessage<OtherAlloc> const& other) {
            _msg.assign(other.message().begin(), other.message().end());
            return *this;
        }
    
        template <typename OtherAlloc>
        BasicMessage(std::allocator_arg_t, allocator_type alloc, BasicMessage<OtherAlloc> const& other)
            : _msg(other.message().begin(), other.message().end(), alloc) { }
    
        BasicMessage(BasicMessage const&) = default;
        BasicMessage(BasicMessage&&) = default;
        BasicMessage& operator=(BasicMessage const&) = default;
        BasicMessage& operator=(BasicMessage&&) = default;
    
        auto& message() const { return _msg; }
        auto& message()       { return _msg; }
      private:
        bip::basic_string<char, std::char_traits<char>, allocator_type> _msg;
    };
    
    using Message       = BasicMessage<std::allocator<char> >;
    using Feed          = bip::vector<Message>;
    using SharedMessage = BasicMessage<Alloc<char> >;
    using SharedFeed    = bip::vector<SharedMessage, Alloc<SharedMessage> >;
    

    On the bright side, that still uses the "magic assign" due to the scoped_allocator_adaptor fix introduced above. Perhaps if that wasn't desired, you could get away with a little bit less complexity.

    With minor interface changes elsewhere:

    : messages_(capacity, SharedMessage(std::allocator_arg, alloc), alloc) // don't brace initlaize
    

    and

        std::string_view a = feed.at(index).message();
        std::string_view b = shm_feed.at(index).message();
    

    it all still works, see Live On Coliru


    ¹ not standardese, hence the scare-quotes

    ² I'm suspecting you may be trying to implement the Disruptor Pattern

    ³ see Estimating size required for memory mapped boost rtree

    ⁴ replaced managed_shared_memory with manage_mapped_file and reduced capacities`because of Coliru limitations