Search code examples
pythonnetwork-programmingtcptcpdumpgraylog3

tcpdump does not capture all packets at high packet frequency


Let’s say, there is some TCP (unencrypted) traffic you would like to eavesdrop packets and send them somewhere else. You don’t want to interrupt the traffic or interfere to its parts, just eavesdrop it.

In my case, there are TCP GLEF logs going to my Graylog (we will call it Graylog1). And I want the exact same logs to go to my second testing second Graylog (we will call it Graylog2).

In this traffic, there is floating around 5000 packets per second. But it can grow up to 15000 packets per second. And in the future, it could be even more. Thus I need the solution to be relatively robust and able to get every packet.

The best way I found to potentially do it, is by running tcpdump with some python scripts. Where tcpdump is going to capture packets on specific port (localhost:12200) and python script is going to send them to new destination (localhost:12205).

Tcpdump command I use: sudo tcpdump -U -i lo -w - tcp dst port 12200 and 'len > 200' | sudo python3 sendFromStdin.py

Python script: (sendFromStdin.py - send packets from pipe, mentioned in tcpdump command)

import sys
from scapy.all import PcapReader
import socket
import time

target_host = "localhost"
target_port = 12205

packet_sent = 0


def read_and_forward_packets():
    global packet_sent

    with PcapReader(sys.stdin.buffer) as pcap_reader:
        for packet in pcap_reader:
            if packet.haslayer('TCP') and len(packet['TCP'].payload) > 0:
                glef_message = bytes(packet['TCP'].payload).rstrip(b'\n')
                with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
                    s.connect((target_host, target_port))
                    s.sendall(glef_message)
                    packet_sent += 1
                    time.sleep(0.001)
                print(f"Sent packet to {target_host}:{target_port}", packet_sent)


if __name__ == "__main__":
    try:
        read_and_forward_packets()
    except KeyboardInterrupt:
        print("\nInterrupted by user, shutting down.")
    except ConnectionRefusedError as e:
        print("Graylog input is probably down.", e)

Right now, is not important if I want to save it to a pcap file and then give it to the python script or use stdin/stdout with pipe in the command like I did. This script is perhaps not even that important as I later on want to talk especially about tcpdump.

(all of this is happening on my localhost) LEFT: what the traffic looks like | RIGHT: how I want to eavesdrop the traffic

Complications: (When I looked into wireshark, it looked like "1 packet contains 1 log", at least for now) The problem is that tcpdump does not capture all packets that go through the port that tcpdump listens to. For example, I have made another python script, which sends logs at frequence around 5000 packets per second to the port 12200. Total number of logs it could send is 110000. After several seconds after the script ended, I looked how many of them tcpdump captured and it was around 80000 packets from original 110000.

UP: output of sending script | DOWN: output of tcpdump

(In case you want to see the script that is sending me logs to the port 12200 that I want to eavesdrop):

import socket 
import time

def send_logs(file_path, server_ip, server_port):
    try:
        sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
        sock.connect((server_ip, server_port))
        with open(file_path, 'r') as file:
            logs = file.readlines()
        num = 0
        for log in logs:
            sent = False
            while not sent:
                try:
                    sock.sendall(log.encode('utf-8'))
                    sent = True
                    num += 1
                    print("GLEF log number", num, "sent to", server_ip, server_port)
                    time.sleep(0.0001)
                except socket.error as e:
                    sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
                    sock.connect((server_ip, server_port))
            if num == 10000:
                sock.close()

    except Exception as e:
        print(f"An error occurred: {e}")


file_path = '/home/qkopj/testdata/testdata.gelf'
server_ip = 'localhost'
server_port = 12200

send_logs(file_path, server_ip, server_port)

(testdata.glef are, you know, testdata in glef format, in 110000 lines)

Not mentioning the fact that if I use the tcpdump command together with the script, it is even slower. I mean, tcpdump itself captures 80000 from 110000 and the python script after pipe will send 4000 from 80000. But as I said before, the python script is not the main problem here I wanna talk about.

Question: So, I want to know if there is some problem with tcpdump, if it is somehow limitated by maximum number of packets it can take or something. Because if I speed up the traffic, tcpdump captures even less and vice versa. As long as my research goes, it looks like tcpdump and other capturing tools are from some point very hardware-intensive. Because every time I ran tcpdump on this traffic, my cpu (I7 11gen) goes right to 100%.

I am always telling myself "hackers must do this sort of thing in some way"...

I have already tried many things:

  • using socat or netcat for sending insted of python script, but they are packing the packets they got into another packet. This means that I am actually sending packet with two headers. Not mentioning the fact, that tcpdump still does not provide all packets.
  • using python function sniff insted of tcpdump, but it shows up that sniff funcion is even slower in catching packets than tcpdump
  • Quite often I also run in problem where some of the things I work with: tcpdump, scripts, graylog... suddenly drops/reset the TCP connection, thus throwing errors everywhere (BTW this is the reason why am I sending each packet in new TCP connection in sendFromStdin.py and reseting connection in the other script after 10000 packets, and again, not a main problem now)
  • Using different languages like java, but still even if this solves sending data from tcpdump capture it does not resolve tcpdump not capturing all
  • Using wireshark or tshark but both of them are giving similar results as tcpdump does

I Use: Ubuntu 22.04.4 LTS, Python 3.10.12, Tpcdump 4.99.1.


Solution

  • I kinda solved it myself.

    First of all I would like to mention, that TCPDUMP could be limited by hardware. But at the end I discovered that was not my case (at least at the speed of 5000 pck/s). I simply missed that the graylog processing the logs/packets is using around 80%+ CPU, so I mistakenly thought that tcpdump takes this processing power instead. (But for the future readers yea, hardware can limit your tcpdump performance, espesialy at speeds like 60,000+ pck/s)

    Secondly, and most importantly, there is a thing called Nagle's algorithm, which is, as long as I understood, something that agregate smaller packets to bigger ones (to improve TCP/IP efficiency). This means that something in the TCP traffic can merge smaller packets into bigger ones. This packet aggregation happens somewhere in the script that is sending me logs to the port 12200. In python, this aggregation can be disabled with this command sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_NODELAY, 1) and prevent packet aggregation . (In reality, unless you want to make some low-level work with the packets, I do not see much reason to disable it in your scripts).

    This discovery showed that when I sent 110,000 and received 80,000. It doesn't always mean some loss, but it could be packet aggregation. This all means that tcpdump itself was doing its job perfectly all the time. I just gave it smaller amout of packets, where a couple of them were merged.