Why a GO routine like the following outputs sequences of bytes in a random order when using a buffered channel?
Here is the code to replicate the buggy behaviour, where data.csv
is a simple CSV of 1000 rows of random data (100 bytes per row approximately) plus the header row (1001 rows in total).
package main
import (
"bufio"
"os"
"time"
)
func main() {
var channelLength = 10000
var channel = make(chan []byte, channelLength)
go func() {
for c := range channel {
println(string(c))
}
}()
file, _ := os.Open("./data.csv")
scanner := bufio.NewScanner(file)
for scanner.Scan() {
channel <- scanner.Bytes()
}
<-time.After(time.Second * time.Duration(3600))
}
Here are the first 6 lines of the output as an example of what I mean for "broken output":
979,C
tharine,Vero,cveror6@blinklist.com,Female,133.153.12.53
980,Mauriz
a,Ilett,milettr7@theguardian.com,Female,226.123.252.118
981
Sher,De Laci,sdelacir8@nps.gov,Female,137.207.30.217
[...]
On the other hand, the code runs smoothly if channelLength = 0, so with an unbuffered channel (first 6 lines, again):
id,first_name,last_name,email,gender,ip_address
1,Hebert,Edgecumbe,hedgecumbe0@apple.com,Male,108.84.217.38
2,Minor,Lakes,mlakes1@marriott.com,Male,231.185.189.39
3,Faye,Spurdens,fspurdens2@oakley.com,Female,80.173.161.81
4,Kris,Proppers,kproppers3@gmpg.org,Male,10.80.182.51
5,Bronnie,Branchet,bbranchet4@squarespace.com,Male,118.117.0.5
[...]
Data is random generated.
From the buffer.Scanner
docs:
The underlying array may point to data that will be overwritten by a subsequent call to Scan
You have a data race around the use of the slices you're passing over the channel. You need to copy the data you're sending. In this example, that is most easily accomplished by using a string
instead of []byte
, and calling scanner.Text