c++vector move-semantics lock-free circular-buffer

Copy-free thread-safe Ring Buffer for Big Arrays

For signal processing on big arrays (10^7 elements), I use different threads connected with ring buffers. Sadly, too much time is just needed for copying the data to and out of the buffer. The current implementation is based on boost::lockfree::spsc_queue.

So I'm am searching for a solution to swap the ownership of the vectors between the threads and the buffer by using unique_ptr to the vectors (please see drawing attached: swapping pointer between threads and the queue).

Moving smart pointers doesn't fit my needs, because therefore I need to allocate memory during runtime constantly for new vector elements. That overhead is bigger than copying the data around.

Am I missing a flaw in that design?

Are there thread-safe or even lock-free ring buffer implementations allowing swap operations for push and pop?

Edit: I modified a locking ring buffer to swap unique_ptr. The performance boost is huge. Though it doesn't feel like a elegant solution. Any recommendations?

// https://github.com/embeddedartistry/embedded-resources/blob/master/examples/cpp/circular_buffer.cpp

#include <memory>
#include <mutex>

template <typename T, int SIZE>
class RingbufferPointer {
typedef std::unique_ptr<T> TPointer;
public:
    explicit RingbufferPointer() {
        // create objects
        for (int i=0; i<SIZE; i++) {
            buf_[i] = std::make_unique<T>();
        }
    }

    bool push(TPointer &item) {
        std::lock_guard<std::mutex> lock(mutex_);
        if (full())
            return false;

        std::swap(buf_[head_], item);

        if (full_)
            tail_ = (tail_ + 1) % max_size_;

        head_ = (head_ + 1) % max_size_;
        full_ = head_ == tail_;

        return true;
    }

    bool pop(TPointer &item) {
        std::lock_guard<std::mutex> lock(mutex_);
        if (empty())
            return false;

        std::swap(buf_[tail_], item);

        full_ = false;
        tail_ = (tail_ + 1) % max_size_;

        return true;
    }

    void reset() {
        std::lock_guard<std::mutex> lock(mutex_);
        head_ = tail_;
        full_ = false;
    }

    bool empty() const {
        return (!full_ && (head_ == tail_));
    }

    bool full() const {
        return full_;
    }

    int capacity() const {
        return max_size_;
    }

    int size() const {
        int size = max_size_;

        if(!full_) {
            if(head_ >= tail_)
                size = head_ - tail_;
            else
                size = max_size_ + head_ - tail_;
        }

        return size;
    }

private:
    TPointer buf_[SIZE];

    std::mutex mutex_;
    int head_ = 0;
    int tail_ = 0;
    const int max_size_ = SIZE;
    bool full_ = 0;
};

Solution

Moving smart pointers doesn't fit my needs, because therefore I need to allocate memory during runtime constantly for new vector elements.

Not necessarily true if you pre-allocate enough storage and implement your own memory management a la simple segregated storage, a.k.a pooling.

If you do that, there's nothing keeping you from swapping around and you get to keep your existing architecture using any ring-buffer that supports swapping of elements and remain with the same thread-safety you had before. You can check the option of just using boost::pool instead of implementing your own.