I have implemented a complex csv import script in Golang. I use a Workerpool implementation for it. Inside that workerpool, workers run through 1000s of small csv files, categorizing, tagging and branding the products. And they all write to the same database table. So far so good.
The problem i'm facing is, that if i chose more than 2 workers, the process crashes with the following message randomly
The workflow is
foreach (csv) {
workerPool.submit(csv)
}
func worker(csv) {
foreach (line) {
import(line)
}
}
import(line) {
product = get(line)
product.category = determine_category(product)
product.brand = determine_brand(product)
save(brand)
product.tags = determine_tags(product)
//and after all
save(product)
}
I tried to wrap the save() calls in transactions, but it didn't help.
Now i have the following questions:
I'm completely stuck. I can also share the sourcecode if that helps. But I first wanted to know i you guess that it's rather my code or a general issue.
Open a new db connection in each goroutine (or thread, for languages that use threads).
MySQL's protocol is stateful, which means if multiple goroutines attempt to use the same connection, the requests and responses get very confused.
You would have the same problem trying to share any other kind of stateful protocol connection between goroutines.
For example ftp is also a stateful protocol, and that may be easier to understand. A client goroutine might send a message like "get file x" and the response should be a series of messages containing the content of that file. If another goroutine tries to use the same connection while that request/response is inprogress, both clients will be confused. The second goroutine will read packets that belong to a file it didn't request. The first goroutine who requested the file will find some packets it was expecting have already been read.
Similarly, MySQL's protocol does not support multiple client goroutines sharing a single connection.