Search code examples
pythontuplesdefaultdict

Python dictionary with a tuple and tuple count as the value


I have a .csv file: csv file containing packet header data from a wireshark scan that I am iterating through line by line with a for loop. The list contains around 100,000 items, many of which are repeated. I am trying to find how many times each destination IP address is accessed using TCP protocol(6) on each port ranging from 1 to 1024. Essentially I am trying to create something that looks like this:

{ip address: {(protocol:port):count}}

Where I will know how many times a combination of protocol/port tried to use the IP address as a destination. So far I've tried this:

dst = defaultdict(list)
for pkt in csvfile:
   if(pkt.tcpdport > 0 and pkt.tcpdport < 1025):
       tup = (pkt.proto, pkt.tcpdport)
       dst[pkt.ipdst].append(tup)

When I try to print this out I get a list of IP addresses with the protocol, port tuple listed multiple times per IP address. How can I get it so that I show the tuple followed by a count of how many times it occurs in each dictionary entry instead?


Solution

  • Currently, the line dst[pkt.ipdst].append(tup) is telling python, get the value associated with the IP address, and then append the tuple to it. In this case, that means you're appending the tuple to the dictionary associated with the IP address. This is why you're seeing multiple tuples listed per IP address.

    To fix this, simply change your line to dst[pkt.ipdst][tup] += 1. This is telling python to get the dictionary associated with the IP address, get the count associated with the tuple in that dictionary, and then add 1. When printed, this should appear as intended.

    Also, define dst as defaultdict(lambda:defaultdict(dict)) so that in case the protocol,port combination hasn't been tried, it won't throw a KeyError.