Search code examples
mercurialbackupdropboxgoogle-drive-apihardlink

Data corruption of a mercurial repository


I have a mercurial repository at c:\Dropbox\code. I've created a clone of this repo locally using:

hg clone -U c:\Dropbox\code c:\GoogleDrive\codeBackup

This bare repo serves the purpose of backup only. I regularly push changes to codeBackup. Furthermore, both the directories are backed-up in the cloud (Dropbox & Google Drive respectively).

If my repo in code becomes corrupt would the codeBackup repo automatically be corrupt since the clone operation used hard links to the original repo? Thus my double-cloud-backup strategy would be useless?

P.S. : I understand that the fall back option is to use the cloud service to restore a previous known good state.


UPDATE : After digging around, I'll add these for reference

The problem is, if a 'hg clone' was done (without --pull option), then the destination and the source repo share files inside .hg/store by using hardlinks 1, if the filesystem provides the hardlinking feature (NTFS does).

Mercurial is designed to break such hardlinks inside .hg if a commit or push is done to one of the clones. The prerequisite for this is, that the Windows API mercurial is using should give a correct answer, if mercurial is asking "how many hardlinks are on this file?".

We found out that this answer is almost always wrong (always reporting 1, even if it is in fact >1) iff the hg process is running on one Windows computer and the repository files are on a network share on a different Windows computer.

  • To avoid hardlinks (use --pull):

    hg clone -U --pull c:\Dropbox\code c:\GoogleDrive\codeBackup

  • To check for hardlinks:

    fsutil hardlink list <file> : Shows all hardlinks for <file>

    find . -links +1 : Shows all files with hardlinks > 1

    ls -l : shows hardlinks count next to each file


Solution

  • The only way you code repository can become corrupt (assuming it was not corrupt when you initially cloned it over to codeBackup) is when you write something to it, be it committing, rewriting history, etc. Whenever something gets written to a hard-linked file, Mercurial first breaks the hard link, creates an independent copy of the file and then only modifies that newly created copy.

    So to answer your questions: under normal usage scenarios repository corruption will not propagate to your codeBackup repository.