performance go optimization benchmarking channel

How to use channels efficiently

I read on Uber's style guide that one should use at most a channel length of 1.

Although it's clear to me that using a channel size of 100 or 1000 is very bad practice, I was however wondering why a channel size of 10 isn't considered a valid option. I'm missing some part to get to the right conclusion.

Below, you can follow my arguments (and counter arguments) backed by some benchmark test.

I understand that, if your both go-routines, responsible for writing or reading from this channel, would be interrupted in between sequential writings or readings to/from the channel by some other IO action, no gain is to be expected from a higher channel buffer and I agree that 1 is the best option.

But, lets say that there is no significant other go-routine switching needed apart from the implicit locking and unlocking caused by writing/reading to/from the channel. Then I would conclude the following:

Consider the amount of context switches when processing 100 values on a channel with either a channel buffer of size 1 and of 10 (GR = go-routine)

Buffer=1: (GR1 inserts 1 value, GR2 reads 1 value) X 100 ~ 200 go-routine switches
Buffer=10: (GR1 inserts 10 values, GR2 reads 10 values) X 10 ~ 20 go-routine switches

I did some benchmarking to prove that this actually goes faster:

package main

import (
    "testing"
)

type a struct {
    b [100]int64
}

func BenchmarkBuffer1(b *testing.B) {
    count := 0
    c := make(chan a, 1)
    go func() {

        for i := 0; i < b.N; i++ {
            c <- a{}
        }
        close(c)
    }()
    for v := range c {
        for i := range v.b {
            count += i
        }
    }
}

func BenchmarkBuffer10(b *testing.B) {
    count := 0
    c := make(chan a, 10)
    go func() {

        for i := 0; i < b.N; i++ {
            c <- a{}
        }
        close(c)
    }()
    for v := range c {
        for i := range v.b {
            count += i
        }
    }
}

Results when comparing simple reading & writing + non-blocking processing:

BenchmarkBuffer1-12              5072902               266 ns/op
BenchmarkBuffer10-12             6029602               179 ns/op
PASS
BenchmarkBuffer1-12              5228782               256 ns/op
BenchmarkBuffer10-12             5392410               216 ns/op
PASS
BenchmarkBuffer1-12              4806208               287 ns/op
BenchmarkBuffer10-12             4637842               233 ns/op
PASS

However, if I add a sleep every 10 reads, it doesn't yield any better results.


import (
    "testing"
    "time"
)

func BenchmarkBuffer1WithSleep(b *testing.B) {
    count := 0
    c := make(chan int, 1)
    go func() {
        for i := 0; i < b.N; i++ {
            c <- i
        }
        close(c)
    }()
    for a := range c {
        count++
        if count%10 == 0 {
            time.Sleep(time.Duration(a) * time.Nanosecond)
        }
    }
}

func BenchmarkBuffer10WithSleep(b *testing.B) {
    count := 0
    c := make(chan int, 10)
    go func() {
        for i := 0; i < b.N; i++ {
            c <- i
        }
        close(c)
    }()
    for a := range c {
        count++
        if count%10 == 0 {
            time.Sleep(time.Duration(a) * time.Nanosecond)
        }
    }
}

Results when adding a sleep every 10 reads:

BenchmarkBuffer1WithSleep-12              856886             53219 ns/op
BenchmarkBuffer10WithSleep-12             929113             56939 ns/op

FYI: I also did the test again with only one CPU and got the following results:

BenchmarkBuffer1                 5831193               207 ns/op
BenchmarkBuffer10                6226983               180 ns/op
BenchmarkBuffer1WithSleep         556635             35510 ns/op
BenchmarkBuffer10WithSleep        984472             61434 ns/op

Solution

Absolutely nothing is wrong with a channel of cap 500 e.g. if this channel is used as a semaphore.

The style guide you read recommends to not use buffered channels of let's say cap 64 "because this looks like a nice number". But this recommendation is not because of performance! (Btw: You microbenchmarks are useless microbenchmarks, they do not measure anything relevant.)

An unbuffered channel is some kind of synchronisation primitive and us such very much useful.

A buffered channel, well, may buffer between sender and receiver and this buffering can be problematic for observing, tuning and debugging the code (because creation and consumption are further decoupled). Thats why the style guide recommends unbuffered channels (or at most a cap of 1 as this is sometimes needed for correctness!).

It also doesn't prohibit larger buffer caps:

Any other [than 0 or 1] size must be subject to a high level of scrutiny. Consider how the size is determined, what prevents the channel from filling up under load and blocking writers, and what happens when this occurs. [emph. mine]

You may use a cap of 27 if you can explain why 27 (and not 22 or 31) and how this will influence program behaviour (not only performance!) if the buffer is filled.

Most people overrate performance. Correctness, operational stability and maintainability come first. And this is what this style guide is about here.