how to folder synch between a node.js server and node.js application

A node.js client application needs to synch a folder with a remote node.js server. Both are running on windows. The synch only needs to be one-way, from server to client and some way of knowing when it is completed would be good. Bandwidth is not a key consideration, an entire file could be re-downloaded if there is a partial change. As far as frequency goes, 15 minute batch update attempts would be ok for example.

What approach or library would be preferable to say, passing xml representations of the folder contents and downloading each changed file?

Thanks

Solution

The simplest way I can think of to write your own one-way sync of a single directory of files works as follows:

The client collects a list of the files it currently has and some identifying version information for each file (version number, CRC, orig file creation time-date).
Client sends that list to the server in an ajax request.
Server receives the list of client files and compares it to its own file list. It then returns back to the client three lists of files: 1) files to update by downloading the newest version, 2) files on the client to remove, 3) new files for the client to download. Lists 1) and 3) could be merged in some implementations, but sometimes it's useful to know which files are new.
The client goes to work processing those commands, downloading new/changed files and removing any files that should be removed.
When the client has finished the download, it can create it's own notification that the process has completed.

There are a couple key aspects to this process. First off, some sort of identifying version information is important. The simplest scheme here is that the server keeps track of a monotonically increasing version number for each file such that each time the file is changed on the server, that version number is increased. When the file is transferred to the client, the client also knows that version number and the version number cannot be lost. If it is not convenient to store a separate version number, it is possible to use the file modification date/time, but the client will have to be very careful whenever it updates it's own files to set the modification date/time to exactly what it is supposed to be to match the server's date/time rather than just accept the date/time that it was last written to locally on the client because that isn't the last server modification time.

Version numbers can also be stored in the filename as an identifiable suffix such as core-scripts-v11. In this case, the actual filename to the outside world would be core-scripts, but it would be stored in the repository as core-scripts-v11 to indicate that it is version 11. If this file is changed to a new version, that new version would become core-scripts-v12. Any comparison of this with the client file list would need to compare both core name and versions separately, not just raw filenames.

If you want an atomic sync operation, where a consistent set of files is always transferred and you can never get part of a newer batch of files and part of an older batch of files, then a bunch more work must be done. When files are updated on the server, they must be updated in an atomic way so that a client in the middle of syncing with a prior version is not interrupted. This would most likely be done by maintaining several versions of the server repository so that a client syncing with an existing version of the repository can continue and finish syncing with the repository and the installation of newer files won't interrupt that. Again, there are many possible ways to solve this particular problem.