process operating-system thread-synchronization

acquire() and release() lock operations with testandset()

I need to solve the following problem:

a. Show how to implement acquire() and release() lock operations using TestandSet instruction.

b. Identify a performance problem, that could occur in your solution when it runs on a multiprocessor, but does not occur on a uniprocessor. Describe a concrete scenario where the performance problem arises.

c. Describe an alternative lock implementation that reduces the performance problem in b, and explain how it helps in the concrete scenario you presented in b.

I have my acquire() and release() setup like these:

acquire() {
    while(TestandSet(true)){
        //wait for lock to be released
    {
}
release() {
    TestandSet(false);
}

However, I could not identify any performance issue regarding multiple processors or a single processor. What is the performance issue? Or, is my implementation of acquire() and release() correct?

Solution

Found on the testAndSet wiki:

The four major evaluation metrics for locks in general are uncontended lock-acquisition latency, bus traffic, fairness, and storage.

Test-and-set scores low on two of them, namely, high bus traffic and unfairness.

When processor P1 has obtained a lock and processor P2 is also waiting for the lock, P2 will keep incurring bus transactions in attempts to acquire the lock. When a processor has obtained a lock, all other processors which also wish to obtain the same lock keep trying to obtain the lock by initiating bus transactions repeatedly until they get hold of the lock. This increases the bus traffic requirement of test-and-set significantly. This slows down all other traffic from cache and coherence misses. It slows down the overall section, since the traffic is saturated by failed lock acquisition attempts. Test-and-test-and-set is an improvement over TSL since it does not initiate lock acquisition requests continuously.

When we consider fairness, we consider if a processor gets a fair chance of acquiring the lock when it is set free. In an extreme situation the processor might starve i.e. it might not be able to acquire the lock for an extended period of time even though it has become free during that time.

Storage overhead for TSL is next to nothing since only one lock is required. Uncontended latency is also low since only one atomic instruction and branch are needed.