Search code examples
python-3.xweb-crawlerscapypcap

Write pcap file about TCP traffic of a web-crawler


url request and sniff(count=x) don't work together. sniff(count) is waiting for x packets to sniff, and though I have to put the line before the url-request it blocks the program, the url-request never starts and it never sniffs any packet.

When I opened 2 Windows in ubuntu command line, it worked. In the first window I activated the interactive mode of python and activated the sniffer. After doing that, I started the web-crawler int the second window and the sniffer in the 1st window received the packets correctly and put it on the screen / into a pcap-file.

Now the easiest way would be to write 2 scripts and start them from 2 different Windows, but I want to do the complete work in one script: Webcrawling, sniffing the packets and putting them into a pcap-file

Here is the code that does not work:

class spider():
…
    def parse():
        a = sniff(filter="icmp and host 128.65.210.181", count=1)
        req = urllib.request.urlopen(self.next_url.replace(" ",""))
        a.nsummary()
        charset = req.info().get_content_charset()

Now the first line blocks the program, waiting 4 the packet to come in, what cannot do so because only in the next line the request is done. Swapping the lines also doesn't work. I think that the only way to resolve the problem is to work with paralessisms, so I've also tried this:

class protocoller(): 
    ...
    def run(self):
         self.pkt = sniff(count=5) # and here it blocks
…
prot = protocoller()
Main.thr = threading.Thread(target=prot.run())
Main.thr.start()

I Always thought that the thread is running indipendently from the main program, but it blocks it as if it would be part of it. Any suggestions?

So what I would need is a solution in which the web-crawler and the IP/TCP protocoller based on scapy are running independently from each other.

Could the sr()-function of scapy be an alternative?

https://scapy.readthedocs.io/en/latest/usage.html

Is it possible to put the request manually in the packet and to put the received packet into the pcap-file?


Solution

  • Your example doesn't show what's going on in other threads so I assume you've got a second thread to do the request etc. If all that is in order the obvious error is here:

    Main.thr = threading.Thread(target=prot.run())
    

    This executes the function prot.run and passes the result to the target parameter of Thread. It should be:

    Main.thr = threading.Thread(target=prot.run)
    

    This passes the function itself into Thread