I was going through rsync utility and in the section: Determining which parts of a file have changed it says: The sender computes the rolling checksum for every chunk of size S in its own version of the file, even overlapping chunks.
Here comes the doubts:
abcdefgh
and s=4
than receiver will calculate checksum for abcd
and efgh
and send those to sender. And than sender will calculate checksum for: abcd
, bcde
, cdef
, defg
, efgh
(is this overlapping chunk) and compare with the receiver.If the recipient calculated rolling checksums for all the overlapping chunks, there would be almost as many checksums as the size of the file. So sending the checksums to the sender would be more expensive than sending the entire contents of the file, and there would be no savings. So the recipient splits the file into large, non-overlapping chunks, so it can send a small number of checksums and hashes.
The size S has to be the same on the sender and recipient. Otherwise, the checksums and hashes could never be expected to match.
A more detailed explanation of the algorithm can be found here. It makes it clear that S is the same on the sender and recipient.
Here's an example. Suppose the file on the recipient is:
abcdefghijkl
and S = 4. The recipient will send checksums and hashes for abcd
, efgh
, ijkl
to the sender.
Suppose one character X is inserted into the file on the sender:
abcdeXfghijkl
It will calculate checksums for:
abcd
bcde
cdeX
deXf
eXfg
Xfgh
fghi
ghij
hijk
ijkl
It will compare these to the checksums received from the recipient, and determine that abcd
and ijkl
have not changed. All it needs to send is eXfgh
, with instructions that this should replace the efgh
chunk.