Search code examples
pythonversion-controlcompressiondiffrcs

Single-file history format/library for binary files?


My application is going to edit a bunch of large files, completely unrelated to each other (belonging to different users), and I need to store checkpoints of the previous state of the files. Delta compression should work extremely well on this file format. I only need a linear history, not branches or merges.

There are low-level libraries that give part of the solution, for example xdelta3 sounds like a good binary diff/patch system.

RCS actually seems like a pretty close match to my problem, but doesn't handle binary files well.

git provides a complete solution to my problem, but is an enormous suite of programs, and its storage format is an entire directory.

Is there anything less complicated than git that would:

  • work on binary files
  • perform delta compression
  • let me commit new "newest" versions
  • let me recall old versions

Bonus points if it would:

  • have a single-file storage format
  • be available as a C, C++, or Python library

I can't even find the right combination of words to google for this category of program, so that would also be helpful.


Solution

  • From RCS manual (1. Overview)

    [RCS] can handle text as well as binary files, although functionality is reduced for the latter.

    RCS seems a good option worth to try.

    I work for a Foundation which has been using RCS to keep under version control tens of thousands of completely unrelated files (git or hg are not an option). Mostly text, but also some media files, which are binary in nature.

    RCS does work quite well with binary files, only make sure not to use the Substitute mode options, to avoid inadvertently substituting binary bits that looks like $ Id.

    To see if this could work for you, you could for example try with a Photoshop image, put it under version control with RCS. Then change a part, or add a layer, and commit the change. You could then verify how well RCS can manage binary files for you.

    RCS has been serving us quite well. It is well maintained, reliable, predictable, and definitely worth a try.