Search code examples
c++multithreadingc++20atomicmemory-barriers

Lock-free thread safe lazy manual initialization of singleton in modern C++


I know Singleton is not good and that in general the thread safe lazy initialization in modern C++ can be done by static local variable, but I have quite specific use case and although I can use safer "less-performant" techniques(double check with mutex for the second check), I still would love to verify my understandings on atomic variables and memory ordering.

Also the main reason I cannot use static local variable is because it needs to be shared between DLL/SO boundaries. - Please ignore that statement and just accept that I want to do the initialization myself.

This is the code which came to my mind as the most performant lock-free solution without using stronger memory barriers. It seems to work, but I do wonder if that is really the case.

Is my usage of memory ordering correct here? Will it have the expected behavior?

// in header file
class DLL_EXPORT MyLogger
{
public:
    void logMessage(std::string message);

private:
    static std::atomic<MyLogger*> sSharedInstance;
}

// in cpp file
constinit std::atomic<MyLogger*> MyLogger::sSharedInstance{nullptr};

MyLogger& MyLogger::sharedInstance()
{
    MyLogger* logger = sSharedInstance.load(std::memory_order_relaxed);
    if (!logger)
    {
        logger = new MyLogger(); // The constructor is no-op, so redundant instances are not an issue.
        MyLogger* previousLogger = nullptr;
        if (!sSharedInstance.compare_exchange_strong(previousLogger, logger, std::memory_order_relaxed, std::memory_order_relaxed))
        {
            // Lost the init race - destroy the redundant logger
            delete logger;
            logger = previousLogger;
        }
    }

    return *logger;
}

// in different threads
void someFunction();
{
    // Can be called from many threads...
    MyLogger::sharedInstance().logMessage("some message");
}


Solution

  • After tons of research and discussion and thanks to the answer of @user17732522 and the really good comments from @PeterCordes I finally understand the issues in the code from my question. There are two changes which have to be made. The load operation have to become std::memory_order_consume and the third parameter(success case) of compare_exchange_strong call should become std::memory_order_release.

    I will try to explain as much as I can why and how things happens here, but first lets establish some base ground:

    • Compiler and or CPU cannot change the observable logic of the function in a single threaded perspective.
    • Atomic operations on an atomic variables are... well guaranteed to be atomic even in the relaxed memory ordering, you cannot have middle state on the variable there.

    An example issue about the compare_exchange_strong in my question is that the compiler can reorder the object creation(as @user17732522 pointed out). So imagine the following theoretical scenario:

    • The memory for the MyLogger is allocated and the address is assigned to the local logger variable. However, the constructor still might not be called as the compiler and/or CPU can reorder that operation, because of the relaxed semantics and because the compare_exchange_strong only rely on the pointer address, not on its object state.
    • Another thread enters and reads the pointer address, while the constructor of MyLogger() is still not called or in progress, thus resulting in having valid memory address pointing to an object in an unknown state.
    • Usages after that are undefined behavior...

    To the rescue comes std::memory_order_consume for the load operation and std::memory_order_release for the success case of the compare-exchange. The release semantics make it so no operations can be reordered after it, thus guaranteeing that both the local variable logger gets the memory address and the construction of the MyLogger instance is completed, thus when the pointer address is written to sSharedInstance by the compare_exchange_strong() call it is guaranteed that the object will be in a fully constructed state.

    We can leave the failure case of compare_exchange_strong() to std::memory_order_relaxed, because it uses a local stack variable which of course can be seen only by a single thread.

    Now, initially I thought that at least the load operation can be kept with relaxed memory ordering, but thanks to @PeterCordes now I understand that due to some specific CPUs(notably DEC Alpha), even though the constuctor have finished execution, before the assignment of the new address to sSharedInstance another thread might be able to see the new pointer address but still not observe the object values which were changed by the MyLogger() constuctor... This is why for the load operation at least consume semantics are needed.

    NOTE: std::memory_order_consume is temporary discouraged by the standard due to its complicated nature. Most compilers automatically promote it to std::memory_order_acquire, but in order to have extra safety, it would be better to use std::memory_order_acquire in the code too. For more info. check out this link: https://en.cppreference.com/w/cpp/atomic/memory_order#Release-Consume_ordering