Search code examples
c#.net-corehttpclienttime-wait

HttpClient with multiple proxies while handling socket exhaustion and DNS recycling


We are working on a fun project with a friend and we have to execute hundreds of HTTP requests, all using different proxies. Imagine that it is something like the following:

for (int i = 0; i < 20; i++)
{
    HttpClientHandler handler = new HttpClientHandler { Proxy = new WebProxy(randomProxy, true) };

    using (var client = new HttpClient(handler))
    {
        using (var request = new HttpRequestMessage(HttpMethod.Get, "http://x.com"))
        {
            var response = await client.SendAsync(request);

            if (response.IsSuccessStatusCode)
            {
                string content = await response.Content.ReadAsStringAsync();
            }
        }

        using (var request2 = new HttpRequestMessage(HttpMethod.Get, "http://x.com/news"))
        {
            var response = await client.SendAsync(request2);

            if (response.IsSuccessStatusCode)
            {
                string content = await response.Content.ReadAsStringAsync();
            }
        }
    }
}

By the way, we are using .NET Core (Console Application for now). I know there are many threads about socket exhaustion and handling DNS recycling, but this particular one is different, because of the multiple proxy usage.

If we use a singleton instance of HttpClient, just like everyone suggests:

  • We can't set more than one proxy, because it is being set during HttpClient's instantiation and cannot be changed afterwards.
  • It doesn't respect DNS changes. Re-using an instance of HttpClient means that it holds on to the socket until it is closed so if you have a DNS record update occurring on the server the client will never know until that socket is closed. One workaround is to set the keep-alive header to false, so the socket will be closed after each request. It leads to a sub-optimal performance. The second way is by using ServicePoint:
ServicePointManager.FindServicePoint("http://x.com")  
    .ConnectionLeaseTimeout = Convert.ToInt32(TimeSpan.FromSeconds(15).TotalMilliseconds);

ServicePointManager.DnsRefreshTimeout = Convert.ToInt32(TimeSpan.FromSeconds(5).TotalMilliseconds);

On the other hand, disposing HttpClient (just like in my example above), in other words multiple instances of HttpClient, is leading to multiple sockets in TIME_WAIT state. TIME_WAIT indicates that local endpoint (this side) has closed the connection.

I'm aware of SocketsHttpHandler and IHttpClientFactory, but they can't solve the different proxies.

var socketsHandler = new SocketsHttpHandler
{
    PooledConnectionLifetime = TimeSpan.FromMinutes(10),
    PooledConnectionIdleTimeout = TimeSpan.FromMinutes(5),
    MaxConnectionsPerServer = 10
};

// Cannot set a different proxy for each request
var client = new HttpClient(socketsHandler);

What is the most sensible decision that can be made?


Solution

  • First of all, I want to mention that @Stephen Cleary's example works fine if the proxies are known at compile-time, but in my case they are known at runtime. I forgot to mention that in the question, so it's my fault.

    Thanks to @aepot for pointing out those stuff.

    That's the solution I came up with (credits @mcont):

    /// <summary>
    /// A wrapper class for <see cref="FlurlClient"/>, which solves socket exhaustion and DNS recycling.
    /// </summary>
    public class FlurlClientManager
    {
        /// <summary>
        /// Static collection, which stores the clients that are going to be reused.
        /// </summary>
        private static readonly ConcurrentDictionary<string, IFlurlClient> _clients = new ConcurrentDictionary<string, IFlurlClient>();
    
        /// <summary>
        /// Gets the available clients.
        /// </summary>
        /// <returns></returns>
        public ConcurrentDictionary<string, IFlurlClient> GetClients()
            => _clients;
    
        /// <summary>
        /// Creates a new client or gets an existing one.
        /// </summary>
        /// <param name="clientName">The client name.</param>
        /// <param name="proxy">The proxy URL.</param>
        /// <returns>The <see cref="FlurlClient"/>.</returns>
        public IFlurlClient CreateOrGetClient(string clientName, string proxy = null)
        {
            return _clients.AddOrUpdate(clientName, CreateClient(proxy), (_, client) =>
            {
                return client.IsDisposed ? CreateClient(proxy) : client;
            });
        }
    
        /// <summary>
        /// Disposes a client. This leaves a socket in TIME_WAIT state for 240 seconds but it's necessary in case a client has to be removed from the list.
        /// </summary>
        /// <param name="clientName">The client name.</param>
        /// <returns>Returns true if the operation is successful.</returns>
        public bool DeleteClient(string clientName)
        {
            var client = _clients[clientName];
            client.Dispose();
            return _clients.TryRemove(clientName, out _);
        }
    
        private IFlurlClient CreateClient(string proxy = null)
        {
            var handler = new SocketsHttpHandler()
            {
                Proxy = proxy != null ? new WebProxy(proxy, true) : null,
                PooledConnectionLifetime = TimeSpan.FromMinutes(10)
            };
    
            var client = new HttpClient(handler);
    
            return new FlurlClient(client);
        }
    }
    

    A proxy per request means an additional socket for each request (another HttpClient instance).

    In the solution above, ConcurrentDictionary is used to store the HttpClients, so I can reuse them, which is the exact point of HttpClient. I could use same proxy for 5 requests, before it gets blocked by API limitations. I forgot to mention that in the question as well.

    As you've seen, there are two solutions solving socket exhaustion and DNS recycling: IHttpClientFactory and SocketsHttpHandler. The first one doesn't suit my case, because the proxies I'm using are known at runtime, not at compile-time. The solution above uses the second way.

    For those who have same issue, you can read the following issue on GitHub. It explains everything.

    I'm open-minded for improvements, so poke me.