Lack of fairness in STM, why can't blocked threads be woken up in FIFO order?

I'm revisiting the STM chapter of Marlow's book. There, it is stated that:

When multiple threads block on an MVar, they are guaranteed to be woken up in FIFO order

However, the same can't be done on the STM case:

A transaction can block on an arbitrary condition, so the runtime doesn't know whether any individual transaction will be able to make progress after the TVar is changed; it must run the transaction to find out. Hence, when there are multiple transactions that might be unblocked, we have to run them all; after all, they might all be able to continue now.

What I don't get is why from this it follows that

Because the runtime has to run all the blocked transactions, there is no guarantee that threads will be unblocked in FIFO order ...

I'd expect that even though we have to run all the transactions in an STM block, we can still wake the threads up in a FIFO order. So I guess I'm missing some important details.

Solution

The point of STM is to speculate: we try running all the transactions hoping that they do not conflict with one another (or perform a retry). When we do discover a conflict, we allow some transactions to commit, while making the conflicting ones to rollback.

We could run only one transaction, wait for it to complete or to block, and then run another one, and so on, but doing so would amount to use a single "global lock", making the computation completely sequential.

More technically, when threads are waiting on a MVar, those threads will progress on a very simple condition: the MVar becoming non empty. On wake up, a thread will take the value, making it empty. So, at most one thread can perform the take, and there's no point in waking more than one.

By constrast, when threads are waiting because of STM, the condition is much more complex. Suppose they are waiting because they previously performed a retry, so they are waiting for some previously read TVar to be changed. When that happens, we can't really know which thread will block again unless we re-run its transaction. Unlike MVar, it is now possible that waking them all up will cause all of them to complete without conflict, so we try doing just that. In doing so we hope for many (if not all) to complete, and prepare to rollback again for those that do not.

Consider this concrete STM example:

Thread 1:
read X
if X is zero, retry
compute f(X)
write Y = f(X)

Thread 2:
read X
if X is zero, retry
compute g(X)
write Z = g(X)

Assume at the beginning we have "X=Y=Z=0". We run both threads, but they retry. Then a third thread writes X=1. We can wake both, and there's no point in doing so in FIFO order, since both will complete. We can, and should, compute f(X) and g(X) in parallel.

Now, if thread 2 wrote on Y instead of Z at the end, there would be a conflict. In this case, running the transactions in sequence would be more efficient (since otherwise we compute f(X) or g(X) twice). However, in the general case, it's hard to predict if there will be a conflict or not. STM does not attempt to, leaving to the programmer to optimize their code.