multithreading haskell lazy-evaluation ghc thunk

How atomic are GHC's thunks?

How does GHC handle thunks that are accessed by multiple threads (either explicit threads, or the internal ones that evaluate sparks)? Can it happen that multiple threads evaluate the same thunk, duplicating work? Or, if thunks are synchronized, how, so that performance doesn't suffer?

Solution

From the blog post @Lambdageek linked, the GHC Commentary and the GHC User's Guide I piece together the following:

GHC tries to prevent reevaluating thunks, but because true locking between threads is expensive, and thunks are usually pure and so harmless to reevaluate, it normally does so in a sloppy manner, with a small chance of duplicating work anyhow.

The method it uses for avoiding work is to replace thunks by a blackhole, a special marker that tells other threads (or sometimes, the thread itself; that's how <<loop>> detection happens) that the thunk is being evaluated.

Given this, there are at least three options:

By default, it uses "lazy blackholing", where this is only done before the thread is about to pause. It then "walks" its stack and creates "true" blackholes for new thunks, using locking to ensure each thunk only gets one thread blackholing it, and aborting its own evaluation if it discovers another thread has already blackholed a thunk. This is cheaper as it doesn't need to consider thunks whose evaluation time is so short that it fits completely between two pauses.
With the -feager-blackholing-flag, blackholes are instead created as soon as a thunk starts evaluating, and the User's Guide recommends this if you are doing a lot of parallelism. However, because locking on every thunk would be too expensive, these blackholes are cheaper "eager" ones, which are not synchronized with other threads (although other threads can still see them if there's not a race condition). Only when the thread pauses do these get turned into "true" blackholes.
A third case, which the blog post was particularly about, is used for special functions like unsafePerformIO, for which it's harmful to evaluate a thunk more than once. In this case, the thread uses a "true" blackhole with real locking, but creates it immediately, by inserting an artificial thread pause before the real evaluation.