appending one byte array dramatically fewer allocations than 2 byte array

According to the Go Benchmarks shown below (by running go test -v -bench=. -benchmem) , appending a one byte array to an array costs 1 allocation per operation.

Benchmark_AddOneByteArray-4              1986709           581.2 ns/op      1792 B/op          1 allocs/op

However, appending a 2 byte array costs 101 allocations per operation, the same as appending a 32 byte array (101 allocations)

Benchmark_AddTwoByteArray
Benchmark_AddTwoByteArray-4               529726          2235 ns/op        1992 B/op        101 allocs/op
Benchmark_AddThirtyTwoByteArray
Benchmark_AddThirtyTwoByteArray-4         282092          4431 ns/op        4992 B/op        101 allocs/op

Why does appending a one byte array cost only 1 allocation per op whereas all other sizes tested costs 101 allocations per op?

func addOneByteArray(n int) []any {
    my_array := make([]any, n)

    for i := 0; i < n; i++ {
        my_array[i] = [1]byte{}
    }
    return my_array
}
func addTwoByteArray(n int) []any {
    my_array := make([]any, n)

    for i := 0; i < n; i++ {
        my_array[i] = [2]byte{}
    }
    return my_array
}
func addThirtyTwoByteArray(n int) []any {
    my_array := make([]any, n)

    for i := 0; i < n; i++ {
        my_array[i] = [32]byte{}
    }
    return my_array
}

var N = 100
func Benchmark_AddOneByteArray(b *testing.B) {
    for i := 0; i < b.N; i++ {
        _ = addOneByteArray(N)
    }
}

func Benchmark_AddTwoByteArray(b *testing.B) {
    for i := 0; i < b.N; i++ {
        _ = addTwoByteArray(N)
    }
}

func Benchmark_AddThirtyTwoByteArray(b *testing.B) {
    for i := 0; i < b.N; i++ {
        _ = addThirtyTwoByteArray(N)
    }
}

Solution

It's an optimization for interfaces, when the value can fit in the interface representation.

Normally, an interface is represented as a data block with two pointers; one is a pointer to the interface's metadata, and the other is a pointer to its value. But when the value is no larger than a pointer, there's an optimization to fit the value in the interface block itself.

Thanks to @rocka2q and @coxley for the clues.

Note that this is an optimization in the golang runtime code, not in the compiler, so changing compiler optimization options won't make a difference.

For more information, see https://g4s8.wtf/posts/go-low-latency-one/, but note that what it says about function arguments is also true about interface variables.