Can we not add an acquire barrier when locking if there is no release (unlock) from other threads?

This question is an extension of this problem:

Do we need a memory acquire barrier for one-shot spinlocks?

Under normal circumstances, we need to add acquire semantics when locking and release semantics when unlocking to prevent competition, like this:

code block0:

atomic_int lock = 0;

// many threads
void threads(void)
{
    while (atomic_exchange_explicit(&lock, 1, memory_order_acquire)); // acquire lock
    // do something
    atomic_store_explicit(&lock, 0, memory_order_release); // release
}

But in the discussion of the previous question, there is at least one situation where there is no need to acquire when locking:

there is only one thread will execute the relevant code and there is no competition.

Some people think that this is not a locking algorithm, but I still use try-lock to represent it here because it is convenient to express.

code block1:

atomic_int lock = 0;

// many threads
void threads(void)
{
    int v = 0;
    // try_lock , we can use relaxed here
    if (!atomic_compare_exchange_strong_explicit(&lock, &v, 1, memory_order_relaxed, memory_order_relaxed)) {
        return; // return if try_lock failed
    }
    // do something
    // never unlock
}

Now let's start describing this problem, can we not add an acquire barrier when locking if there is no release (unlock) from other threads?

For example:

code block2:

#define EXIT_FLAG 1
#define WORK_FLAG 2

atomic_int state = 0;

void thread0(void)
{
    int tmp;
    while (1) {
        tmp = 0;
    // do we need acquire here?
        if (!atomic_compare_exchange_strong_explicit(&state, &tmp, WORK_FLAG, memory_order_relaxed, memory_order_relaxed)) {
            assert(tmp == EXIT_FLAG);
            return;
        }

        // do work

        tmp = WORK_FLAG;
    // we must need release here to fit with thread1 acquire
        if (!atomic_compare_exchange_strong_explicit(&state, &tmp, 0, memory_order_release, memory_order_relaxed)) {
            assert(tmp == (EXIT_FLAG | WORK_FLAG));
            // do the clean
            return;
        }
    }
}

void thread1(void)
{
    int tmp = 0;

    while (1) {
    // we must need acquire here to fit with release in thread0
        if (atomic_compare_exchange_strong_explicit(&state, &tmp, tmp | EXIT_FLAG, memory_order_acquire, memory_order_relaxed))
            break;
    }

    if (!(tmp & WORK_FLAG)) {
        // do the clean
    }
}

We don't want do-work and do-clean to race. As for the specific content of do-work and do-clean, I don't think it's important, but for the sake of clarity, I'll add another specific example:

code block3:

#define EXIT_FLAG 1
#define WORK_FLAG 2

atomic_int state = 0;
struct work_struct *work_data; //inited with calloc

void thread0(void)
{
    int tmp;
    while (1) {
        tmp = 0;
        // do we need acquire here?
        if (!atomic_compare_exchange_strong_explicit(&state, &tmp, WORK_FLAG, memory_order_relaxed, memory_order_relaxed)) {
            assert(tmp == EXIT_FLAG);
            return;
        }

        read/write(*work_data);

        tmp = WORK_FLAG;
        // we must need release here to fit with thread1 acquire
        if (!atomic_compare_exchange_strong_explicit(&state, &tmp, 0, memory_order_release, memory_order_relaxed)) {
            assert(tmp == (EXIT_FLAG | WORK_FLAG));
            free(work_data);
            work_data = NULL;
            return;
        }
    }
}

void thread1(void)
{
    int tmp = 0;

    while (1) {
        // we must need acquire here to fit with release from thread0
        if (atomic_compare_exchange_strong_explicit(&state, &tmp, tmp | EXIT_FLAG, memory_order_acquire, memory_order_relaxed))
            break;
    }

    if (!(tmp & WORK_FLAG)) {
        free(work_data);
        work_data = NULL;
    }
}

I think this example can be abstracted as:

code block4:

lock_t lock;

// worker
void thread0(void)
{
    while (1) {
        if (!try_lock_relaxed(&lock)) // need acquire here? I think not
            return;
        // do work
        unlock_release(&lock);  // must need release
    }
}

// many threads
void threads(void)
{
    // must need acquire here, to fit with unlock_release from thread0
    if (!try_lock_acquire(&lock))
        return;
    // do free or do work
    // never unlock
}

In this example, for thread0, there are no unlock (release) operations from other threads, the only unlock operation comes from itself. In this case, is it still necessary to acquire when locking?

Solution

Under normal circumstances, we need to add acquire semantics when locking and release semantics when unlocking to prevent competition

No, preventing competition is not the purpose of memory ordering. You have no need for memory ordering unless there is competition, in the sense of conflicting memory accesses.¹ The purpose of memory ordering is to provide for defined behavior despite such conflicting accesses, both in general and in the form of some guarantees and constraints on which non-atomic reads can or will observe which conflicting writes.

With respect specifically to locking, a lock, as typically understood and implemented, provides exclusivity to the thread that holds it. Often that means that only one thread at a time can execute any of the regions protected by a given lock. That has some relevance to preventing (a different form of) competition, but it does not have to have any associated memory ordering beyond that for the lock itself.

However, inasmuch as one of the main purposes of such locks is to maintain consistency of shared data structures, we typically do want to lock with acquire semantics and unlock with release semantics. Having tied such memory ordering semantics to a lock, if all threads that access the shared data do so only while holding the lock, each one sees the most recent modifications to that shared data by any other thread that previously held the lock. Likewise, its own modifications are observable by other threads that hold the lock afterward. That is usually desirable.

there is at least one situation where there is no need to acquire when locking:

there is only one thread will execute the relevant code and there is no competition.

Not exactly. Again, memory ordering is about conflicting memory accesses, not so much about which code is executed. It is possible for two threads executing entirely different code to have conflicting memory accesses, and it is possible for two threads executing the same code to not have any conflicting accesses. It's about which objects are accessed, in what way, not about what code is executed.

Since inter-thread memory ordering is about conflicting accesses, yes, if there is no possibility of conflicting accesses then there is no need for any inter-thread memory ordering, even though you might still want locking. But general-purpose locks will acquire, because that is needed for most of their use cases.

can we not add an acquire barrier when locking if there is no release (unlock) from other threads?

Of course, but doing so serves no purpose. Without the possibility of a release for it to synchronize with, an acquire does nothing for you.

And this is a possible scenario for the aforementioned general-purpose lock. Locking it will perform an acquire. That is not harmful, except possibly for a tiny one-time performance cost, if no release is ever performed by unlocking the lock.

In [the code-block 4] example, for thread0, there are no unlock (release) operations from other threads, the only unlock operation comes from itself. In this case, is it still necessary to acquire when locking?

I think you're assuming that only one thread executes thread0() during any given run of the program. Again, inter-thread memory ordering is about memory operations, not regions of code, so if more than one thread might execute thread0() and thereby produce conflicting accesses, then you would definitely need the acquire for well-defined behavior.

If only one thread runs thread0(), and all other threads run threads(), and we are concerned only with conflicting access appearing in // do free or do work of threads() and // do work of thread0(), then no, thread0() does not need the acquire. No two accesses from the same thread conflict with each other, and if // do work is executing then there cannot yet have been any memory accesses by // do free or do work to constitute a conflict. There may be subsequent conflicting accesses from // do free or do work, but these are ok because between there must be a release by thread0() and an acquire by the relevant execution of threads().

¹ In the sense in which the C language specification uses that term: two accesses to the same non-atomic object by different threads, at least one of which is a write.