Search code examples
python-3.xtsharkpyshark

read domain name from pcapng file


I want to extract the domain name from DNS packets (request/response) from .pcapng file. the following code what I used

def extract_domain_name(pkt):
     try:
        if pkt.dns.qry_name:
            #print (pkt.ip.src, pkt.dns.qry_name)
            return pkt.dns.qry_name
     except AttributeError as e:
        #ignore packets that aren't DNS Request
        pass
     try:
        if pkt.dns.resp_name:
            print (pkt.ip.src, pkt.dns.resp_name)
            return pkt.dns.resp_name
     except AttributeError as e:
        #ignore packets that aren't DNS Response
        pass
        

def process_pcapng_file(filename):
    # Open the pcapng file
    cap = pyshark.FileCapture(filename)

    # Extract the domain names from the DNS packets
    domains = set()
    for pkt in cap:
        #print (pkt)
        if 'DNS' in pkt:
            #domain = pkt.dns.qry_name
            domain = extract_domain_name(pkt)
            if domain is not None:
                domains.add(domain)

it only extract the domain name from query packets not from query and response. what could the problem? However,

I tried to use if pkt.dns.resp_name: without try: and I got AttributeError


Solution

  • Thanks for posting the sample capture; that helps.

    I think the reason your code works for me but not for you is that in your sample capture, the only replies are SERVFAIL messages:

    $ tcpdump -nn -r sample.pcap port domain | awk '{print $7}' | sort | uniq -c
      38365 A?
      38393 ServFail
    

    It looks like for SERVFAIL messages, pkt.dns will not have a resp_name attribute.

    it only extract the domain name from query packets not from query and response

    Just to be explicit: in your sample capture, there are no valid query responses, so pkt.dns.resp_name is never defined.

    There are a couple of things to think about here:

    1. If your logic is effectively:

      if pkt.dns.qry_name:
        return pkt.dns.qry_name
      if pkt.dns.resp_name:
        return pkt.dns.resp_name
      

      You will never reach the second if statement because a query response also includes the original query (so you will always return pkt.dns.qry_name).

    2. Do you really care about resp_name? In all cases, either pkt.dns.resp_name will match pkt.dns.qry_name, or it won't exist.

    It seems to me you could simplify your code to:

    def process_pcapng_file(filename):
        cap = pyshark.FileCapture(filename)
    
        return set(
            pkt.dns.qry_name
            for pkt in cap
            if pkt.highest_layer == "DNS" and pkt.dns.qry_name
        )
    

    But if you want to use your existing extract_domain_name function, you'll need to reverse the checks for resp_name and qry_name:

    def extract_domain_name(pkt):
        try:
            if pkt.dns.resp_name:
                return pkt.dns.resp_name
        except AttributeError:
            pass
    
        try:
            if pkt.dns.qry_name:
                return pkt.dns.qry_name
        except AttributeError:
            pass
    

    You can make that a little shorter by replacing the exception handling with hasattr:

    def extract_domain_name(pkt):
        if hasattr(pkt.dns, "resp_name") and pkt.dns.resp_name:
            return pkt.dns.resp_name
    
        if hasattr(pkt.dns, "qry_name") and pkt.dns.qry_name:
            return pkt.dns.qry_name