I'm trying to understand more about go's channels and goroutines, so I decided to make a little program that count words from a file, read by a bufio.NewScanner
object:
nCPUs := flag.Int("cpu", 2, "number of CPUs to use")
flag.Parse()
runtime.GOMAXPROCS(*nCPUs)
scanner := bufio.NewScanner(file)
lines := make(chan string)
results := make(chan int)
for i := 0; i < *nCPUs; i++ {
go func() {
for line := range lines {
fmt.Printf("%s\n", line)
results <- len(strings.Split(line, " "))
}
}()
}
for scanner.Scan(){
lines <- scanner.Text()
}
close(lines)
acc := 0
for i := range results {
acc += i
}
fmt.Printf("%d\n", acc)
Now, in most examples I've found so far both the lines
and results
channels would be buffered, such as make(chan int, NUMBER_OF_LINES_IN_FILE)
. Still, after running this code, my program exists with a fatal error: all goroutines are asleep - deadlock!
error message.
Basically my thought it's that I need two channels: one to communicate to the goroutine the lines from the file (as it can be of any size, I don't like to think that I need to inform the size in the make(chan)
function call. The other channel would collect the results from the goroutine and in the main function I would use it to e.g. calculate an accumulated result.
What should be the best option to program in this manner with goroutines and channels? Any help is much appreciated.
As @AndrewN has pointed out, the problem is each goroutine gets to the point where it's trying to send to the results
channel, but those sends will block because the results
channel is unbuffered and nothing reads from them until the for i := range results
loop. You never get to that loop, because you first need to finish the for scanner.Scan()
loop, which is trying to send all the line
s down the lines
channel, which is blocked because the goroutines are never looping back to the range lines
because they're stuck sending to results
.
The first thing you might try to do to fix this is to put the scanner.Scan()
stuff in a goroutine, so that something can start reading off the results
channel right away. However, the next problem you'll have is knowing when to end the for i := range results
loop. You want to have something close the results
channel, but only after the original goroutines are done reading off the lines
channel. You could close the results
channel right after closing the lines
channel, however I think that might introduce a potential race, so the safest thing to do is also wait for the original two goroutines to be done before closing the results
channel: (playground link):
package main
import "fmt"
import "runtime"
import "bufio"
import "strings"
import "sync"
func main() {
runtime.GOMAXPROCS(2)
scanner := bufio.NewScanner(strings.NewReader(`
hi mom
hi dad
hi sister
goodbye`))
lines := make(chan string)
results := make(chan int)
wg := sync.WaitGroup{}
for i := 0; i < 2; i++ {
wg.Add(1)
go func() {
for line := range lines {
fmt.Printf("%s\n", line)
results <- len(strings.Split(line, " "))
}
wg.Done()
}()
}
go func() {
for scanner.Scan() {
lines <- scanner.Text()
}
close(lines)
wg.Wait()
close(results)
}()
acc := 0
for i := range results {
acc += i
}
fmt.Printf("%d\n", acc)
}