Search code examples
goconcurrencygoroutine

Loop to check condition in concurrent program


I'm reading a book about concurrency in Go (I'm learning it now) and I found this code:

c  :=  sync.NewCond(&sync.Mutex{})
queue  :=  make([]interface{},  0,  10)
removeFromQueue  :=  func(delay  time.Duration) {
    time.Sleep(delay)
    c.L.Lock()
    queue  =  queue[1:]
    fmt.Println("Removed from queue")
    c.L.Unlock() c.Signal()
} 

for  i  :=  0;  i  <  10;  i++ {
    c.L.Lock()

    // Why this loop?
    for  len(queue)  ==  2 {
        c.Wait()
    }

    fmt.Println("Adding to queue")
    queue  =  append(queue,  struct{}{})
    go  removeFromQueue(1*time.Second)
    c.L.Unlock()
}

The problem is that I don't understand why the author introduces the for loop marked by the comment. As far as I can see, the program would be correct without it, but the author says that the loop is there because Cond will signal that something has happened only, but that doesn't mean that the state has truly changed.

In what case could that be possible?


Solution

  • Without the actual book at hand, and instead just some code snippets that seem out of context, it is hard to say what the author had in mind in particular. But we can guess. There is a general point about condition variables in most languages, including Go: waiting for some condition to be satisfied does require a loop in general. In some specific cases, the loop is not required.

    The Go documentation is, I think, clearer about this. In particular, the text description for sync's func (c *Cond) Wait() says:

    Wait atomically unlocks c.L and suspends execution of the calling goroutine. After later resuming execution, Wait locks c.L before returning. Unlike in other systems, Wait cannot return unless awoken by Broadcast or Signal.

    Because c.L is not locked when Wait first resumes, the caller typically cannot assume that the condition is true when Wait returns. Instead, the caller should Wait in a loop:

    c.L.Lock()
    for !condition() {
        c.Wait()
    }
    ... make use of condition ...
    c.L.Unlock()
    

    I added bold emphasis to the phrase that explains the reason for the loop.

    Whether you can omit the loop depends on more than one thing:

    • Under what condition(s) does another goroutine invoke Signal and/or Broadcast?
    • How many goroutines are running, and what might they be doing in parallel?

    As the Go documentation says, there's one case we don't have to worry about in Go, that we might in some other systems. In some systems, the equivalent of Wait is sometimes resumed (via the equivalent of Signal) when Signal (or its equivalent) has not actually been invoked on the condition variable.

    The queue example you've quoted is particularly odd because there is only one goroutine—the one running the for loop that counts to ten—that can add entries to the queue. The remaining goroutines only remove entries. So if the queue length is 2, and we pause and wait for a signal that the queue length has changed, the queue length can only have changed to either one or zero: no other goroutine can add to it and only the two goroutines we have created at this point can remove from it. This means that given this particular example, we have one of those cases where the loop is not required after all.

    (It's also odd in that queue is given an initial capacity of 10, which is as many items as we'll put in, and then we start waiting when its length is exactly 2, so that we should not reach that capacity anyway. If we were to spin off additional goroutines that might add to the queue, the loop that waits while len(queue) == 2 could indeed be signaled by a removal that drops the count from 2 to 1 but not get a chance to resume until insertion occurs, pushing the count back up to 2. However, depending on the situation, that loop might not be resumed until two other goroutines have each added an entry, pushing the count to 3, for instance. So why repeat the loop when the length is exactly two? If the idea is to preserve queue slots, we should loop while the count is greater than or equal to 2.)

    (Besides all this, the initial capacity is not relevant as the queue will be dynamically resized to a large slice if necessary.)