Search code examples
c#.netdnsasync-awaitreverse-dns

Speed up reverse DNS lookups for large batch of IPs


For analytics purposes, I'd like to perform reverse DNS lookups on large batches of IPs. "Large" meaning, at least tens of thousands per hour. I'm looking for ways to increase the processing rate, i.e. lower the processing time per batch.

Wrapping the async version of Dns.GetHostEntry into await-able tasks has already helped a lot (compared to sequential requests), leading to a throughput of appox. 100-200 IPs/second:

static async Task DoReverseDnsLookups()
{
    // in reality, thousands of IPs
    var ips = new[] { "173.194.121.9", "173.252.110.27", "98.138.253.109" }; 
    var hosts = new Dictionary<string, string>();

    var tasks =
        ips.Select(
            ip =>
                Task.Factory.FromAsync(Dns.BeginGetHostEntry,
                    (Func<IAsyncResult, IPHostEntry>) Dns.EndGetHostEntry, 
                    ip, null)
                    .ContinueWith(t => 
                    hosts[ip] = ((t.Exception == null) && (t.Result != null)) 
                               ? t.Result.HostName : null));

    var start = DateTime.UtcNow;
    await Task.WhenAll(tasks);
    var end = DateTime.UtcNow;

    Console.WriteLine("Resolved {0} IPs in {1}, that's {2}/sec.", 
      ips.Count(), end - start, 
      ips.Count() / (end - start).TotalSeconds);
}

Any ideas how to further improve the processing rate?

For instance, is there any way to send a batch of IPs to the DNS server?

Btw, I'm assuming that under the covers, I/O Completion Ports are used by the async methods - correct me if I'm wrong please.


Solution

    • As always, I would suggest using TPL Dataflow's ActionBlock instead of firing all requests at once and waiting for all to complete. Using an ActionBlock with a high MaxDegreeOfParallelism lets the TPL decide for itself how many calls to fire concurrently, which can lead to a better utilization of resources:

    var block = new ActionBlock<string>(
        async ip => 
        { 
            try
            {
                var host = (await Dns.GetHostEntryAsync(ip)).HostName;
                if (!string.IsNullOrWhitespace(host))
                {
                    hosts[ip] = host;
                }
            }
            catch
            {
                return;
            }
        },
        new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = 5000});
    
    • I would also suggest adding a cache, and making sure you don't resolve the same ip more than once.

    • When you use .net's Dns class it includes some fallbacks beside DNS (e.g LLMNR), which makes it very slow. If all you need are DNS queries you might want to use a dedicated library like ARSoft.Tools.Net.


    P.S: Some remarks about your code sample:

    1. You should be using GetHostEntryAsync instead of FromAsync
    2. The continuation can potentially run on different threads so you should really be using ConcurrentDictionary.