Search code examples
c++multithreadingstaticcpuid

Thread safety for deterministic one-time write to static location?


I'm checking CPUID flags on x86-64 for instruction set extensions, and caching the results in a static array. The function doing this may be called from any of several different threads, so race conditions are possible - but does this matter? Any thread will produce the same results, so I don't care about conflicting writes, and checking the same flags more than once is a non-issue. My conclusion is that this is safe, but am I missing something?

Here's a stripped-down version of what I'm doing:

enum class Arch : int
{
    SSE2,
    AVX,
    MAX
};

static bool CheckCPUID(Arch arch)
{
    // Check CPU features
}

static bool IsSupportedArch(Arch arch)
{
    enum class TriVal : int
    {
        Unknown,
        True,
        False
    };

    constexpr int arch_max = static_cast<int>(Arch::MAX);
    const int i_arch = static_cast<int>(arch);

    // Cache results in-memory:
    static TriVal cached_results[arch_max - 1] = { TriVal::Unknown, TriVal::Unknown };

    if (cached_results[i_arch] == TriVal::Unknown)
    {
        cached_results[i_arch] = CheckCPUID(arch) ? TriVal::True : TriVal::False;
    }

    return cached_results[i_arch] == TriVal::True;
}

Solution

  • so I don't care about conflicting writes

    Conflicting writes to non-atomic objects are a data race and a data race always causes undefined behavior. The compiler may optimize based on that.

    Therefore you must care about this.

    You can use a std::array<TriVal, arch_max - 1> for the static local variable cached_results and have it be set in its initializer. The initialization of static local variables is guaranteed to be thread-safe and executed exactly once (assuming C++11 or later).

    static bool IsSupportedArch(Arch arch)
    {
        enum class TriVal : int
        {
            Unknown,
            True,
            False
        };
    
        constexpr int arch_max = static_cast<int>(Arch::MAX);
        const int i_arch = static_cast<int>(arch);
    
        // Cache results in-memory:
        static const auto cached_results = []{
            std::array<TriVal, arch_max-1> cached_results;
            
            // loop here to set the whole array
    
            return cached_results;
        }();
    
        return cached_results[i_arch] == TriVal::True;
    }
    

    Alternatively you can mark the variable thread_local, but in that case every thread will call CPUID once.

    If you really care that CPUID is called only as few number of times as required, you can make the array elements std::atomics:

    static std::atomic<TriVal> cached_results[arch_max - 1]{};
    

    In that case it is ok to have concurrent writes and as you say you don't care about which one ends up being the value. You probably want to replace the loads and stores with relaxed memory order operations though, since you don't need the default seq_cst guarantees:

    static bool IsSupportedArch(Arch arch)
    {
        enum class TriVal : int
        {
            Unknown,
            True,
            False
        };
    
        constexpr int arch_max = static_cast<int>(Arch::MAX);
        const int i_arch = static_cast<int>(arch);
    
        // Cache results in-memory:
        static std::atomic<TriVal> cached_results[arch_max - 1]{};
    
        auto value = cached_results[i_arch].load(std::memory_order_relaxed);
    
        if (value == TriVal::Unknown)
        {
            // OK only if CheckCPUID is guaranteed to always
            // return the same value
            value = CheckCPUID(arch) ? TriVal::True : TriVal::False;
            cached_results[i_arch].store(value, std::memory_order_relaxed);
        }
    
        return value == TriVal::True;
    }