I'm creating a Windows service that runs when a specific USB key is plugged in. What it does is simple: contact an FTP server, download some files, and store them in an (encrypted) archive on the USB. The archive can be opened read-only with a tool provided to the client (but that's irrelevant to my problem).
The service is used to keep the USB in sync with the master server (pretty much like Dropbox, but only download and the synchronized folders are on the removable media). The archive can grow up to a few gigabytes. About 1GB of the files are updated every week on the keys of around 400 users.
Since the entire update process is transparent to the user, there is the non-negligible chance that they unplug the USB when data is being written to the archive (even if I put some kind of screaming, flashy warning: DO NOT UNPLUG). Corrupting the archive would require to download it again in its entirety, that means quite a lot of bandwidth wasted on the already loaded servers.
So basically I need the writes to the archive to be transacted. It's OK if they fail, as long as they do not put the container in an inconsistent state. Either the file is entirely written, either it is not. It's OK if the file is partially written if the container does not actually "see" it.
The question is here: How can I guarantee data consistency at all times? Specifically, how do you make IO operations to work as transactions? What would you suggest? Shall I implement something on my own? Or are there already containers that offer this functionnality?
This is what I've got so far:
If this question is too general please move it to SU or something.
You may want to try using something like svn or git to download encrypted differences; they typically can be used to reconstruct a file locally if it gets corrupted. Or just download diffs and use patch to generate the latest file version.
You have other problems if the user unplugs a FLASH drive while it's in the process of writing data. Many are not reliable (at the flash block level, not the file system level) and can be corrupted to the point that a journaling file system like NTFS or EXT3 cannot recover. There's more detail here: https://superuser.com/questions/290060/can-flash-memory-be-physically-damaged-if-power-is-interrupted-while-writing