go concurrency channel producer-consumer

Golang: How to tell whether producer or consumer is slower when communicating via buffered channels?

I have an app in Golang where I have a pipeline setup where each component performs some work, then pass along its results to another component via a buffered channel, then that component performs some work on its input then pass along its results to yet another component via another buffered channel, and so on. For example:

C1 -> C2 -> C3 -> ...

where C1, C2, C3 are components in the pipeline and each "->" is a buffered channel.

In Golang buffered channels are great because it forces a fast producer to slow down to match its downstream consumer (or a fast consumer to slow down to match its upstream producer). So like an assembly line, my pipeline is moving along as fast as the slowest component in that pipeline.

The problem is I want to figure out which component in my pipeline is the slowest one so I can focus on improving that component in order to make the whole pipeline faster.

The way that Golang forces a fast producer or a fast consumer to slow down is by blocking the producer when it tries to send to a buffered channel that is full, or when a consumer tries to consume from a channel that is empty. Like this:

outputChan <- result  // producer would block here when sending to full channel

input := <- inputChan // consumer would block here when consuming from empty channel

This makes it hard to tell which one, the producer or consumer, is blocking the most, and thus the slowest component in pipeline. As I cannot tell how long it is blocking for. The one that is blocking the most amount of time is the fastest component and the one that is blocking the least (or not blocking at all) is the slowest component.

I can add code like this just before the read or write to channel to tell whether it would block:

// for producer
if len(outputChan) == cap(outputChan) {
    producerBlockingCount++
}
outputChan <- result

// for consumer
if len(inputChan) == 0 {
    consumerBlockingCount++
}
input := <-inputChan

However, that would only tell me the number of times it would block, not the total amount of time it is blocked. Not to mention the TOCTOU issue where the check is for a single point in time where state could change immediately right after the check rendering the check incorrect/misleading.

Anybody that has ever been to a casino knows that it's not the number of times that you win or lose that matters, it's the total amount of money that you win or lose that's really matter. I can lose 10 hands with $10 each (for a total of $100 loss) and then wins one single hand of $150, I would still comes out ahead.

Likewise, it's not the number of times that a producer or consumer is blocked that's meaningful. It's the total amount of time that a producer or consumer is blocked that's the determining factor whether it's the slowest component or not.

But I cannot think of anyway to determine the total amount that something is blocked at the reading to / writing from a buffered channel. Or my google-fu isn't good enough. Anyone has any bright idea?

Solution

There are several solutions that spring to mind.

1. stopwatch

The least invasive and most obvious is to just note the time, before and after, each read or write. Log it, sum it, report on total I/O delay. Similarly report on elapsed processing time.

2. benchmark

Do a synthetic bench, where you have each stage operate on a million identical inputs, producing a million identical outputs.

Or do a "system test" where you wiretap the messages that flowed through production, write them to log files, and replay relevant log messages to each of your various pipeline stages, measuring elapsed times. Due to the replay, there will be no I/O throttling.

3. pub/sub

Re-architect to use a higher overhead comms infrastructure, such as Kafka / 0mq / RabbitMQ. Change the number of nodes participating in stage-1 processing, stage-2, etc. The idea is to overwhelm the stage currently under study, no idle cycles, to measure its transactions / second throughput when saturated.

Alternatively, just distribute each stage to its own node, and measure {user, sys, idle} times, during normal system behavior.