windows performance language-agnostic cluster-computing diagnostics

Diagnosing pathological behavior of a piece of cluster software

I'm using a kind of load balancer over a small cluster that is able to achieve >2000rps on zero-duration requests (t.i. ones that are immediately satisfied by the worker nodes). But as soon as the requests stop being zero-duration and start taking even 1ms, performance immediately drops >10x. The data being transfered in both directions is identical and is about 2kb in size. This is for sure not related to saturation of the cluster or network throughput, because 200rps of 1ms requests is a very tiny load and the network is 10Gbit. Besides, the CPU load is just some 2-5% both on the load balancer and on the worker nodes.

I wonder whether that might be related to some pathological behavior of the OS scheduler, or the OS network stack (t.i. there is some special case behavior for very short interactions).

How might I diagnose the reason? Which perfcounters to watch? What tools or methodologies to use?

(Just in case someone simply knows the answer to my particular problem, I'm talking about the MS HPC Server 2008 R2's "WCF Broker", running on Windows Server 2008 R2 over Hyper-V)

Solution

Turned out it was a completely network-unrelated issue having to do with peculiarities of the scheduling mechanism of HPC Server. I resolved the issue by tweaking a configuration option "serviceRequestPrefetchCount" to 0 in the loadBalancing section of the WCF service config file.