Search code examples
c++c++11x86memory-modelstdatomic

x86 relaxed ordering performance?


Since intel provides strong hardware memory model, is there any advantage at all to using "memory_order_relaxed" in a C++11 program? Or just leave it at default "sequential consistent" since it makes no difference?


Solution

  • Like most answers in computer science, the answer to this is "that depends."

    First of all, the idea that sequentially consistent ordering never carries any penalty is incorrect. Depending on your code (and possibly compiler), it can and will carry a penalty.

    Second, to make intelligent decisions about the memory ordering constraints, you need to think about (and understand) how you're using the data involved.

    memory_order_relaxed is useful for something like a standalone counter that needs to be atomic, but isn't directly related to something else so it doesn't need to be consistent with any "something else". The typical example would be a reference count, such as in shared_ptr or some older implementations of std::string. In this case, we just need to assure that the counter is incremented and decremented atomically, and that modifications to it are visible to all threads. But, particularly, there's not any related data with which it needs to remain consistent, so we don't care much about it's ordering with respect to anything else.

    Sequentially Consistent ordering is pretty much at the opposite extreme. It's probably the easiest to apply--you write the code just about like it was single threaded, and the implementation assures that it works correctly (that's not to say you don't have to take threading into account at all, but sequentially consistent ordering generally requires the least thought about it, but is also generally the slowest model).

    Acquire/release consistency are normally used when you have two or more related pieces of information, and you need to assure that one only becomes visible before/after the other. For one example that I dealt with recently, let's assume you're building something roughly like an in-memory database. You have some data, and you have some metadata (and you're storing each more or less separately).

    The metadata is used (among other things) for searching the database. We want to assure that if somebody finds some particular data that the data they found will actually be present in the database.

    To assure this, we want to assure that the data is always present before the metadata and continues to exist at least as long as the metadata. The database would be inconsistent if somebody could search the database using the metadata, and find some data it wants to use, when that data isn't actually present.

    So in this case, when we're adding a record, we need to assure that we add the data first, then add the metadata--and the compiler must not rearrange the two. Likewise, when we're deleting a record, we need to delete the metadata (so nobody will find the data), then delete the data itself. In the case of the data itself, chances are we have a reference count to keep track of how many clients are currently accessing that data, to assure that we don't delete it while somebody is trying to use it.

    So in this case, we can use acquire/release semantics for the metadata and data, and relaxed ordering for the reference count. Or, if we want to keep our code as simple as possible, we could use sequential consistency throughout--even though it might (and probably will) carry at least some penalty.