Search code examples
rubytcpnetwork-programmingpacket-snifferstcpdump

Differentiate between TCP Connections using Seq No , Ack No and/or Datasize


I am aggregating connections by going through collected packet dump,collected using TCPDUMP. My code is in Ruby. The code will differentiate between connections using the 4-tuple ( SrcIP,SrcPort,DstIP,DstPort) Now if the connections are between the same machine, having the same IP and the same port then the connections are differentiated by the following method. 1. If the time between the connections is more than 2Hrs then its a new connection 2. If we see we have already seen a FIN or a RST then the new packet is from a new connection 3. If the No of SYNs are more than two ( One in each direction) then the connection is a new connection.

The situation I am not able to address is the following If new connection between the same two hosts (having the same 4-tuple) happened within 2Hrs and TCPDUMP dropped the previous RST or FIN Packets and it also dropped 2 or more SYN Packets from both the connections. In that case none of the above conditions that I have set will work. And the only set of Information that remains is the time of the new set of packets ,Seq Nos, Ack Nos and data size. Just using this information could I figure out if the connection is a new one or an old one?

I tried to see if there is a pattern in the sequence No or between the SeqNo and the AckNo but none seem to definite.


Solution

  • Because TCP (primarily) uses a sliding acknowledgement window, the SeqNo and AckNo will be monotonically increasing fields -- until they wrap around due to integer overflow.

    Also, the SeqNo from one direction of traffic corresponds to the AckNo of the other direction of traffic, providing another invariant that you can check.

    One complicating factor is that the SeqNo are initially chosen to be random to reduce the likelihood of man in the middle attacks; so, a new session with otherwise identical parameters might pick initial sequence numbers that are larger than the previously visible sequence numbers, and confuse your algorithms.