Search code examples
goconcurrencymutexshared-memoryatomic

What happens when reading or writing concurrently without a mutex


In Go, a sync.Mutex or chan is used to prevent concurrent access of shared objects. However, in some cases I am just interested in the "latest" value of a variable or field of an object. Or I like to write a value and do not care if another go-routine overwrites it later or has just overwritten it before.

Update: TLDR; Just don't do this. It is not safe. Read the answers, comments, and linked documents!

Update 2021: The Go memory model is going to be specified more thoroughly and there are three great articles by Russ Cox that will teach you more about the surprising effects of unsynchronized memory access. These articles summarize a lot of the below discussions and learnings.

Here are two variants good and bad of an example program, where both seem to produce "correct" output using the current Go runtime:

package main

import (
    "flag"
    "fmt"
    "math/rand"
    "time"
)

var bogus = flag.Bool("bogus", false, "use bogus code")

func pause() {
    time.Sleep(time.Duration(rand.Uint32()%100) * time.Millisecond)
}

func bad() {
    stop := time.After(100 * time.Millisecond)
    var name string

    // start some producers doing concurrent writes (DANGER!)
    for i := 0; i < 10; i++ {
        go func(i int) {
            pause()
            name = fmt.Sprintf("name = %d", i)
        }(i)
    }

    // start consumer that shows the current value every 10ms
    go func() {
        tick := time.Tick(10 * time.Millisecond)
        for {
            select {
            case <-stop:
                return
            case <-tick:
                fmt.Println("read:", name)
            }
        }
    }()

    <-stop
}

func good() {
    stop := time.After(100 * time.Millisecond)
    names := make(chan string, 10)

    // start some producers concurrently writing to a channel (GOOD!)
    for i := 0; i < 10; i++ {
        go func(i int) {
            pause()
            names <- fmt.Sprintf("name = %d", i)
        }(i)
    }

    // start consumer that shows the current value every 10ms
    go func() {
        tick := time.Tick(10 * time.Millisecond)
        var name string
        for {
            select {
            case name = <-names:
            case <-stop:
                return
            case <-tick:
                fmt.Println("read:", name)
            }
        }
    }()

    <-stop
}

func main() {
    flag.Parse()
    if *bogus {
        bad()
    } else {
        good()
    }
}

The expected output is as follows:

...
read: name = 3
read: name = 3
read: name = 5
read: name = 4
...

Any combination of read: and read: name=[0-9] is correct output for this program. Receiving any other string as output would be an error.

When running this program with go run --race bogus.go it is safe.

However, go run --race bogus.go -bogus warns of the concurrent reads and writes.

For map types and when appending to slices I always need a mutex or a similar method of protection to avoid segfaults or unexpected behavior. However, reading and writing literals (atomic values) to variables or field values seems to be safe.

Question: Which Go data types can I safely read and safely write concurrently without a mutext and without producing segfaults and without reading garbage from memory?

Please explain why something is safe or unsafe in Go in your answer.

Update: I rewrote the example to better reflect the original code, where I had the the concurrent writes issue. The important leanings are already in the comments. I will accept an answer that summarizes these learnings with enough detail (esp. on the Go-runtime).


Solution

  • However, in some cases I am just interested in the latest value of a variable or field of an object.

    Here is the fundamental problem: What does the word "latest" mean?

    Suppoose that, mathematically speaking, we have a sequence of values Xi, with 0 <= i < N. Then obviously Xj is "later than" Xi if j > i. That's a nice simple definition of "latest" and is probably the one you want.

    But when two separate CPUs within a single machine—including two goroutines in a Go program—are working at the same time, time itself loses meaning. We cannot say whether i < j, i == j, or i > j. So there is no correct definition for the word latest.

    To solve this kind of problem, modern CPU hardware, and Go as a programming language, gives us certain synchronization primitives. If CPUs A and B execute memory fence instructions, or synchronization instructions, or use whatever other hardware provisions exist, the CPUs (and/or some external hardware) will insert whatever is required for the notion of "time" to regain its meaning. That is, if the CPU uses barrier instructions, we can say that a memory load or store that was executed before the barrier is a "before" and a memory load or store that is executed after the barrier is an "after".

    (The actual implementation, in some modern hardware, consists of load and store buffers that can rearrange the order in which loads and stores go to memory. The barrier instruction either synchronizes the buffers, or places an actual barrier in them, so that loads and stores cannot move across the barrier. This particular concrete implementation gives an easy way to think about the problem, but isn't complete: you should think of time as simply not existing outside the hardware-provided synchronization, i.e., all loads from, and stores to, some location are happening simultaneously, rather than in some sequential order, except for these barriers.)

    In any case, Go's sync package gives you a simple high level access method to these kinds of barriers. Compiled code that executes before a mutex Lock call really does complete before the lock function returns, and the code that executes after the call really does not start until after the lock function returns.

    Go's channels provide the same kinds of before/after time guarantees.

    Go's sync/atomic package provides much lower level guarantees. In general you should avoid this in favor of the higher level channel or sync.Mutex style guarantees. (Edit to add note: You could use sync/atomic's Pointer operations here, but not with the string type directly, as Go strings are actually implemented as a header containing two separate values: a pointer, and a length. You could solve this with another layer of indirection, by updating a pointer that points to the string object. But before you even consider doing that, you should benchmark the use of the language's preferred methods and verify that these are a problem, because code that works at the sync/atomic level is hard to write and hard to debug.)