C++0x concurrent synchronizes, is the fence needed

I've recently asked a few questions about atomics and C++0x, and I'd like to ensure I understand the ordering semantics before I convert any code. Let's say we have this pre-0x code:

atomic_int a = 0;
some_struct b;

Thread A:
b = something;
atomic_store_fence();
a = 1;

Thread B:
if( a == 1 )
{
  atomic_load_fence();
  proc(b);
}

Using whatever your current compiler/platform offers you for atomic_int, atomic_store_fence and atomic_load_fence.

In C++0x the code has a few possible forms. Two obvious ones appear to be:

atomic<int> a = ATOMIC_VAR_INIT(0);
some_struct b;

Thread A:
b = something;
atomic_thread_fence( memory_order_release );
a.store( 1, memory_order_relaxed );

Thread B:
if( a.load( memory_order_relaxed ) == 1)
{
  atomic_thread_fence( memory_order_acquire );
  proc(b);
}

Thread A:
b = something;
a.store( 1, memory_order_release );

Thread B:
if( a.load( memory_order_acquire ) == 1)
{
  proc(b);
}

Am I correct in reading that an atomic store-release / load-acquire sequence is a synchronizes-with event that has the same memory order implications as the explicit fence version? That is, is the second version correct?

If correct then the second one issues the fence more than necessary: even when a != 1. Section 29.8-3 of the standard indicates I can mix and match atomics and fences. So is the below then a correct and reasonable implementation?

Thread A:
b = something;
a.store( 1, memory_order_release );

Thread B:
if( a.load( memory_order_relaxed ) == 1 )
{
  atomic_thread_fence( memory_order_acquire );
  proc(b);
}

Solution

Yes, your understanding is correct, and yes the final listing is a reasonable implementation.

Note that ATOMIC_VAR_INIT is primarily provided for compatibility with C1X, in C++0x you can just write:

std::atomic<int> a(0);