How do you implement Software Transactional Memory?

In terms of actual low level atomic instructions and memory fences (I assume they're used), how do you implement STM?

The part that's mysterious to me is that given some arbitrary chunk of code, you need a way to go back afterwards and determine if the values used in each step were valid. How do you do that, and how do you do it efficiently? This would also seem to suggest that just like any other 'locking' solution you want to keep your critical sections as small as possible (to decrease the probability of a conflict), am I right?

Also, can STM simply detect "another thread entered this area while the computation was executing, therefore the computation is invalid" or can it actually detect whether clobbered values were used (and thus by luck sometimes two threads may execute the same critical section simultaneously without need for rollback)?

Solution

The simplest answer is "it depends". There are tons of radically different implementations working in pretty much any way imaginable.

The part that's mysterious to me is that given some arbitrary chunk of code, you need a way to go back afterward and determine if the values used in each step were valid. How do you do that, and how do you do it efficiently?

One solution is to use versioning. Every time an object is modified, its version number is updated. While the transaction is running, you validate each accessed object's version, and when the transaction commits, you verify that the objects are still valid. This validation can be a simple integer comparison (if transaction_start_version >= object_version, the object is valid), so it can be done fairly efficiently.

This would also seem to suggest that just like any other 'locking' solution you want to keep your critical sections as small as possible (to decrease the probability of a conflict), am I right?

Very likely. I think a few implementations have gone the route of assuming/requiring everything to be a transaction, but yes, in most implementations, transactions are specially marked chunks of code, and the longer a transaction runs, the larger the chance of a conflict that may cause transactions to roll back.

Also, can STM simply detect "another thread entered this area while the computation was executing, therefore the computation is invalid" or can it actually detect whether clobbered values were used (and thus by luck sometimes two threads may execute the same critical section simultaneously without need for rollback)?

The latter. Keep in mind that the idea in TM is to protect data, rather than code.

Different code paths may access the same variable in different transactions. This has to be detected by the TM system. There's no real notion of "this area", since that refers to code, rather than data. The TM system doesn't care what code is executing, it tracks what data is being modified. In that way, it differs entirely from critical sections (which protect code, rather than data)