I know there are several programs out there that will sync files over the network. Non of them do what I have been thinking of. Let me explain what I want to achieve...
In my network several computers share the same files. for example the quickbooks file is accessed by several computers and it is a large file. also there are the pst files from outlook large as well. every night we create a backup over the network of the files that have been changed. I think it does not make sanse to copy a whole 1 gb file if it had some minor modification. so I want to come up with an algorithm that will compare parts of files.
for example let's say that the outlook pst file consists of bytes:
1, 2, 3, 4, 5, 6, 7, 8, 9
if I receive an email the bytes will now be:
1, 2, 3, 4, 5, 6, 7, 8, 9, 10 for example
now instead of sending the whole file it will be easier to send just the byte 10
so in reality the file has thousands of bytes so I will do the checksum of every megabyte of the file so now my table should look like:
aaa1, aaa2, aaa3, abf8, etc...
if when receiving an email now the pst file has a table as:
aaa1, aaa2, aaa3, 7a8b, etc ... then I know that the first 3 megabits are the same and I should send just one megabite instead of the entire file...
I think this algorithm will work great if content was added towards the end of the file but in reality a byte may be changed at the beginning of the file and my algorithm is not going to work. for example if one byte is added at the begining of the file all the hex codes will change...
how can I make the algorithm more efficient? It will be nice if I could send parts of the file instead of the whole file
The rsync protocol will efficiently synchronise large files with small differences. It is much cleverer than the scheme you envisage, so you should either read Tridgell and Mackerras's write-up before embarking on your own solution or just use rsync. There's a free Windows wrapper here.