Search code examples
svnversion-controlperforcedeltavcdiff

Using delta algorithms in version control systems


I've been searching for different version control systems because I'm trying to develop a simple one for a CG studio. There are only a handful of tools out there that provide such services, Like Perforce and SVN. Since most we're mostly dealing with binary files in such studios, it sounds like a good Idea to use delta algorithms and tools like bsdiff or xdelta to create delta files and reconstruct version on request, which significantly reduces required storage. However, I was looking into Perforce for example and apparently files are not stored as deltas and all I could find the server repository were huge work and preview files. So, I'm a bit confused. Using algorithms like xdelta does look enticing, doesn't it?! so why aren't they used more often? or am I wrong and haven't looked deep enough? Is there a fundamental risk to using these tools?!

EDIT

It was pointed out that the question was unclear. What I'm asking is that is there a key limitation to using tools like xdelta (specifically xdelta3) when it comes to creating binary diffs and reconstructing files (such as corrupted diffs or failed reconstructions)? since the repo is not being maintained for over a decade.


Solution

  • Perforce used to store text files in a delta format (using the RCS format, which predates Perforce itself), and several years ago switched to store even text files (which usually are very amenable to delta-based storage, moreso than most binary formats) to full-revision by default.

    There are a bunch of reasons that delta storage can be unattractive:

    • In the "happy path" it might be more efficient, but in the edge cases it's much worse than a simpler solution, and if you're designing a tool that needs to work in many situations with arbitrary data and usage patterns, you generally want to plan for the worst-case scenario.
    • The limiting factors on delta storage (I/O and CPU speed) are harder to scale up than the limiting factors for full-revision (storage space).
    • Delta storage gets slower with more revisions; full-revision is unaffected by total revision count. (Some Perforce depots have files with thousands or even millions of revisions!)
    • A single on-disk file containing many revisions becomes a bottleneck when accessing and adding multiple revisions in parallel; this is not an unsolvable problem, but storing each revision individually makes it not a problem at all.