We have a microservice oriented backend stack. All of the microservices built top of Nancy
and registered as windows services with topshelf
.
One of the service, which handles most traffic (~5000 req/s), started to have threadpool starvation problem on 3 out of 8 servers.
This is the exception we are getting when hitting a specific endpoint:
System.InvalidOperationException: There were not enough free threads in the ThreadPool to complete the operation.
at System.Net.HttpWebRequest.BeginGetResponse(AsyncCallback callback, Object state)
at System.Net.Http.HttpClientHandler.StartGettingResponse(RequestState state)
at System.Net.Http.HttpClientHandler.StartRequest(Object obj)
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at RandomNamedClient.<GetProductBySkuAsync>d__20.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at ProductService.<GetBySkuAsync>d__3.MoveNext() in ...\ProductService.cs:line 34
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at ProductModule.<>c__DisplayClass15.<<.ctor>b__b>d__1d.MoveNext() in ...\ProductModule.cs:line 32
This endpoint calls another service -which is out of my team's domain- in order to get product data. Implementation of it as follows:
Get["/product/sku/{sku}", true] = async (parameters, ctx) =>
{
string sku = parameters.sku;
var product = await productService.GetBySkuAsync(sku);
return Response.AsJson(new ProductRepresentation(product));
};
ProductService.GetBySkuAsync(string sku)
implementation:
public async Task<Product> GetBySkuAsync(string sku)
{
var productDto = await randomNamedClient.GetProductBySkuAsync(sku);
if (productDto == null)
{
throw new ProductDtoNotFoundException("sku", sku);
}
var variantDto = productDto.VariantList.FirstOrDefault(v => v.Sku == sku);
if (variantDto == null)
{
throw new ProductVariantDtoNotFoundException("sku", sku);
}
return MapVariantDtoToProduct(variantDto, productDto);
}
RandomNamedClient.GetProductBySkuAsync(string sku)
implementation (it's from an internal package):
public async Task<ProductDto> GetProductBySkuAsync(string sku)
{
HttpResponseMessage result = await this._serviceClient.GetAsync("Product?Sku=" + sku);
return result == null || result.StatusCode != HttpStatusCode.OK ? (ProductDto) null : this.Decompress<ProductDto>(result);
}
RandomNamedClient.Decompress<T>(HttpResponseMessage response)
implementation:
private T Decompress<T>(HttpResponseMessage response)
{
if (!response.Content.Headers.ContentEncoding.Contains("gzip"))
return HttpContentExtensions.ReadAsAsync<T>(response.Content).Result;
using (GZipStream gzipStream = new GZipStream((Stream) new MemoryStream(response.Content.ReadAsByteArrayAsync().Result), CompressionMode.Decompress))
{
byte[] buffer = new byte[8192];
using (MemoryStream memoryStream = new MemoryStream())
{
int count;
do
{
count = gzipStream.Read(buffer, 0, 8192);
if (count > 0)
memoryStream.Write(buffer, 0, count);
}
while (count > 0);
return JsonConvert.DeserializeObject<T>(Encoding.UTF8.GetString(memoryStream.ToArray()));
}
}
}
All of our services built as Release/32-bit. We didn't tweak anything about threadpool usage.
The biggest problem I see with this code is the Decompress<T>
method which blocks on an async operations using Task.Result
. This could, potentially, stall the retrieval of the thread currently processing the request to the threadpool, or even worse cause deadlocks in your code (this is exactly why you shouldn't block on async code). I'm not sure if you've seen those requests get processed thoroughly, but if NancyFX is handling marshaling of the synchronization context for you (which seems like it does) that may very well be the root cause of the threadpools starvation.
You can alter this by making all the IO processing work inside that method async
as well, and take advantage of the natural asynchronous API those classes already expose. Alternatively, and I definitely don't recommend doing this, you could use ConfigureAwait(false)
everywhere.
(Side note - you can simplify your code by using Stream.CopyToAsync()
)
A proper async implementation would look like this:
private async Task<T> DecompressAsync<T>(HttpResponseMessage response)
{
if (!response.Content.Headers.ContentEncoding.Contains("gzip"))
return await response.Content.ReadAsAsync<T>();
const int bufferSize = 8192;
using (GZipStream gzipStream = new GZipStream(
new MemoryStream(
await response.Content.ReadAsByteArrayAsync()),
CompressionMode.Decompress))
using (MemoryStream memoryStream = new MemoryStream())
{
await gzipStream.CopyToAsync(memoryStream, bufferSize);
return JsonConvert.DeserializeObject<T>(
Encoding.UTF8.GetString(memoryStream.ToArray()));
}
}