Shortening TCP payload with Scapy results in [TCP Previous segment not captured]

I'm trying to modify a TCP payload by stripping out some bytes.

As long at the bytes are replaces with other bytes of the same length instead of stripping them out, modifying the package works fine.

If the bytes are stripped out, Wireshark shows an [TCP Previous segment not captured] message in the dump.

I delete both checksums and the package length of the modified package so that Scapy recalculates all of them when sending the package:

# Delete the old checksums
del packet_mod[IP].chksum
del packet_mod[TCP].chksum

# Delete the old packet length
del packet_mod[IP].len

The modification works if I also cut off len(stripped_bytes) at the end of the modified packet as well, as the re-sent TCP segment is added to the modified package by the receiver.

E.g.: I strip out 20 bytes of the TCP payload. The modification then only works, if I also cut off an additional 20 bytes at the end of the payload.

What am I missing?

Solution

I don't understand what this part means:

E.g.: I strip out 20 bytes of the TCP payload. The modification then only works, if I also cut off an additional 20 bytes at the end of the payload.

Anyway, the thing you're missing is that each TCP segment carries a TCP header field -- the "sequence number" field -- that indicates the position of this segment's data content in the stream of bytes that is being transferred through TCP.

The TCP receiver uses segment sequence numbers and segment lengths (the segment length is computed from the lengths of the IP datagrams that delivered the segment) to build a continuous byte stream from the received traffic. For example, if the receiver has previously collected all data up to sequence position 200 and the next incoming segments look like this:

  (segment 1) sequence=200 length=80 data='data data data ...'
  (segment 2) sequence=280 length=60 data='more data ...'
  (segment 3) sequence=340 length=70 data='even more data ...'

then the receiver knows that it has now collected all of the data up to (but not including) position 410. Since there are no gaps, this data is ready to be passed up to the application.

Note that the segment numbers (1), (2), (3) are not present in the TCP header. Those numbers are only there so that this description can refer to them.

Obviously, if segment (2) had been lost and the receiver had only collected segments (1) and (3) then the receiver would know that there was a gap in the received data. And it would know exactly where that gap was: it's missing 60 bytes starting at position 280. TCP promises to deliver a complete in-order stream of data, so until that gap is filled in, the receiver is not allowed to deliver any later bytes (like the 70 bytes at position 340 that it got in segment 3) to the application. If the missing bytes don't arrive very soon then the receiver will tell the sender about the gap and the sender will retransmit the missing data.

This is why removing bytes from a TCP segment causes problems. If your program removed 20 bytes from segment (2) then the receiver would see this:

  (segment 1) sequence=200 length=80 data='data data data ...'
  (segment 2) sequence=280 length=40 data='more data ...'
  (segment 3) sequence=340 length=70 data='even more data ...'

and the receiver would conclude that it had discovered a gap of 20 bytes at position 320.

If Wireshark is observing this traffic then it will reach the same conclusion. Neither the TCP receiver nor Wireshark knows that the cause of the missing bytes is that segment (2) was edited. Wireshark's most reasonable guess is that the missing bytes were in a segment that somehow wasn't made available for inspection, and that's why it shows the "Previous segment not captured" message. It says "previous" because it doesn't discover that there's a gap until it examines segment (3), then one after the gap.

The receiver will handle this in the same way that it handles any gap. It will tell the sender about the gap and wait for the missing data to be retransmitted. If the receiver gets the missing data then it will fill in the gap and then continue as usual. If you keep intercepting the retransmission and removing the missing bytes then the receiver will continue to report that it has a gap, and the sender will eventually tire of retransmitting the missing data and it will abandon the TCP connection.

This means that you can't simply remove data from an in-flight TCP segment and expect TCP to not notice that data has gone missing. In principle you could do it by deleting some data and then manipulating the sequence number in all of the later segments from the sender and in all of the acknowledgements sent by the receiver, but that's a much larger task.

References: The basic TCP sequence number mechanism is described in RFC 793 and a longstanding best practice for using sequence numbers to improve security was described in RFC 1948 and formalised as a standard in RFC 6528.