Search code examples
pythondnspython

Is that possible to parallel resolve list from two+ dns servers?


I am total new with python, and to be honest, programming at all. I made my first script for resolving list of domains with Google help and some luck I guess.

List of domains contains about 100 000 domains, and I have to optimize time for complete this task, because it will repeating task, and now it need about two hour to do it. I can split list and run each script separately, but if possible to set up 2 or more DNS servers and parallel resolving from them it will great. Or maybe there is more methods to optimize running time?

I had read docs for dnspython, but its too complex for my python skill level (which is ~0).

import socket
import dns.resolver

w = open ('/home/dalt/pyth/resolved.txt', "w")
x = open ('/home/dalt/pyth/not_resolved.txt', "w")
with open('/home/dalt/pyth/domains2.txt') as f:
    my_list = [line.strip() for line in f.readlines()]

resolver = dns.resolver.Resolver()
resolver.nameservers=[socket.gethostbyname('212.xxx.xxx.134')]

for domain in my_list:
    try:
        q = resolver.query(domain, 'A')
        for ipval in q:
            print(ipval, file=w)
    except dns.resolver.NXDOMAIN:
            print(domain, 'NXDOMAIN', file=x)
    except dns.resolver.NoNameservers:
        print(domain, 'NoNameservers',file=x)
    except dns.resolver.NoAnswer:
        print(domain, 'NoAnswer',file=x)
    except dns.name.BadEscape:
        print(domain, 'BadEscape',file=x)

f.close()

Solution

  • I'm not very experienced with networking but I would guess most of the execution time of your script comes from communication with the DNS server, which means that your CPU is mostly just waiting for data, which means that you should be able to optimize the task by the use of multiple threads.

    It is the easiest to use a ThreadPool:

    from multiprocessing.pool import ThreadPool
    import socket
    
    import dns.resolver
    
    my_list = [
        "www.google.com",
        "www.facebook.com",
        "doesnt.exist",
    ]
    
    resolver = dns.resolver.Resolver()
    resolver.nameservers=[
        socket.gethostbyname("8.8.4.4"),
        socket.gethostbyname("8.8.8.8"),
    ]
    
    w = open("resolved.txt", "w")
    x = open("not_resolved.txt", "w")
    
    def resolve(domain):
        try:
            q = resolver.query(domain, "A")
            for ipval in q:
                print(domain, ipval, file=w)
        except dns.resolver.NXDOMAIN:
            print(domain, "NXDOMAIN", file=x)
        except dns.resolver.NoNameservers:
            print(domain, "NoNameservers", file=x)
        except dns.resolver.NoAnswer:
            print(domain, "NoAnswer", file=x)
        except dns.name.BadEscape:
            print(domain, "BadEscape", file=x)
    
    pool = ThreadPool(processes=10)  # increasing this number may speed things up
    results = pool.map(resolve, my_list)
    
    w.close()
    x.close()
    

    Results:

    $ cat not_resolved.txt
    doesnt.exist NXDOMAIN
    $ cat resolved.txt
    www.google.com 172.217.20.196
    www.facebook.com 31.13.81.36
    

    The above code doesn't attempt to distribute the list of domains among the available DNS servers, unless the dnspython package does it under the hood. But I would expect that even a single DNS server will respond really quickly to concurrent queries, because it probably uses multiple threads itself.