Search code examples
.netarchitecturemicroservicesdotnet-httpclient

Example of bulkheads in .NET micro services


In “Building Microservices” (O’Reilly) by Sam Newman, there’s a section named Bulkheads which is part of a chapter that talks about ways to prevent clogged microservices from messing with the whole system.

As stated in this section, an example of a bulkhead would be having separate connection pools to connect to each downstream service.

The author is talking about synchronous calls to downstream services, so I am reading the above as “pools of HTTP clients”.

But, in .NET, it’s been more and more considered best practice to use a singleton HTTP client in order to improve scalability.

Am I thinking straight that, in .NET, this sort of bulkheads would not be applicable?

What other types of bulkheads should we be more concerned about, if any?

Bulkheads page


Solution

  • I would like to explain few things so you could have complete picture to understand this pattern better.

    Sockets and Tcp

    Say you have 3 services A,B,C. On every client request You are required to call them using http. Every time when you create http client then underneath a tcp connection is created and socket is opened. The number of sockets have hard limit and if you have very high number of http calls you may end up having chew up all the socket connections. That's why single http client is required to be reused. In .net core you can use HttpClientfactory to achieve this. So if you have 3 services to call via http you can open 3 separate http connections (sockets) underneath which will be reused.

    Thread Pool

    The other part is about thread pool. Even when you call using shared/singleton http client connection you still have to allocate thread to that connection. Say you have 100 total threads that can be used to accommodate client request. Anything above 100 requests will queue up. Now say you are calling three services independently using http with connection pool of 100 threads. In happy path each service will return back in time , when thread is finished working (http request is completed) it will comeback to pool to fulfill the next client request from queue. During all this time 100 threads are using 3 shared httpclient instances to call external services and there are only 3 sockets underneath. So we are good until this point.

    Failing Service

    Now lets say one service is either slow or down. Since thread pool (100 in this case) is shared and you are calling the slow/down service but threads are taking longer to respond. The other 2 services are still fine and can respond but because of degraded service any thread that is calling degraded service will take longer to complete the request or will eventually timeout before it comes back to thread pool. which means more and more clinet requests are queuing up. At that point, requests by the consumer to other (healthy) services are affected. Eventually the consumer can no longer send requests to other services, not just the original unresponsive service. All your available threads are stuck at degraded service and queue is still growing. Other consumers are no longer able to consume the service, causing a cascading failure effect.

    Bulkhead to Rescue

    This is where bulkhead comes to rescue. Partition service instances into different groups, based on consumer load and availability requirements. This design helps to isolate failures, and allows you to sustain service functionality for some consumers, even during a failure.

    A consumer can also partition resources, to ensure that resources used to call one service don't affect the resources used to call another service. For example, a consumer that calls multiple services may be assigned a connection pool for each service. If a service begins to fail, it only affects the connection pool assigned for that service, allowing the consumer to continue using the other services.

    So from above example you would say please allocate 33 threads to each service. Now failing service will only affect threads allocated to it . The healthy services will keep on using their allocated threads without any problem and will keep fulfilling the client requests.

    .Net Core and Polly

    Polly is very famous library to handle these kind of situations. Polly fits naturally with .Net Core and you can assign multiple policies to http clients including bulkhead.

    You can find more about polly https://github.com/App-vNext/Polly

    Hope that helps !