Search code examples
version-controldvcslarge-files

Is there a distributed VCS that can manage large files?


Is there a distributed version control system (git, bazaar, mercurial, darcs etc.) that can handle files larger than available RAM?

I need to be able to commit large binary files (i.e. datasets, source video/images, archives), but I don't need to be able to diff them, just be able to commit and then update when the file changes.

I last looked at this about a year ago, and none of the obvious candidates allowed this, since they're all designed to diff in memory for speed. That left me with a VCS for managing code and something else ("asset management" software or just rsync and scripts) for large files, which is pretty ugly when the directory structures of the two overlap.


Solution

  • It's been 3 years since I asked this question, but, as of version 2.0 Mercurial includes the largefiles extension, which accomplishes what I was originally looking for:

    The largefiles extension allows for tracking large, incompressible binary files in Mercurial without requiring excessive bandwidth for clones and pulls. Files added as largefiles are not tracked directly by Mercurial; rather, their revisions are identified by a checksum, and Mercurial tracks these checksums. This way, when you clone a repository or pull in changesets, the large files in older revisions of the repository are not needed, and only the ones needed to update to the current version are downloaded. This saves both disk space and bandwidth.