Search code examples
mercurialdvcsfile-rename

mercurial file id


Is there any way to get an immutable file ID for a file in repository?

I need an identifier which will survive a file rename. So if there was file Test01.txt and it was renamed to Test02.txt (using TortoiseHG rename menu item or the hg rename command). I want to have some ID which will correspond to Test01.txt at revision 1 and Test02.txt at revision 2.


Solution

  • Mercurial does not give any ID to files. This is different from some other systems, such as Bazaar, where each file (and directory) has a unique ID that follows the file throughout it's life time.

    The structure in a Mercurial repository is as follows:

    • each entry in the changelog has a single pointer to
      • an entry in the manifest, which has a pointer per file to
        • an entry in the file's filelog

    So if you add Test01.txt in revision 0, then you'll have a chain like this

    changelog@0 -> manifest@0 -> Test01.txt@0
    

    If you now rename and make a new commit, you will create a new changelog and manifest entry, and create a new filelog for Test02.txt:

    changelog@1 -> manifest@1 -> Test02.txt@0
    

    The new Test02.txt filelog entry will reference the Test01.txt entry. This is how Mercurial can keep track of renames:

    $ hg debugdata Test02.txt 0
    
    copy: Test01.txt
    copyrev: 0936f74a58571dd87ad343cc3d6ae8434ad86fc4
    
    test01
    

    The best "file ID" you can talk about is therefore the ID of the first entry in the original file log. You can dig it out with hg debugindex:

    $ hg debugindex Test01.txt
       rev    offset  length   base linkrev nodeid       p1           p2
         0         0       8      0       0 0936f74a5857 000000000000 000000000000
    

    The "nodeid" column gives you the IDs for the revlog entries in the filelog for Test01.txt. Here we see that the first revision of the file has ID 0936f74a5857. This is just a short, 12 character prefix of the full 40 character SHA-1 hash. If you need the full hash, then read on...

    The "linkrev" tells you that this version of the file is referenced by changeset 0. You can lookup the data in that changelog entry with hg debugdata -c 0, but for our purposes the normal hg log command also has the information:

    $ hg log -r 0 --debug
    changeset:   0:8e62ecaada0e5ba9efec234d0d9a66583347becf
    phase:       draft
    parent:      -1:0000000000000000000000000000000000000000
    parent:      -1:0000000000000000000000000000000000000000
    manifest:    0:0537c846cd545da8f826b9d94fdb2fdae457bd07
    user:        Martin Geisler <[email protected]>
    date:        Thu Feb 02 09:00:18 2012 +0100
    files+:      Test01.txt
    extra:       branch=default
    description:
    01
    

    We're interested in the manifest ID. You can now look up the data in the correct manifest entry with:

    $ hg debugdata -m 0537c846cd545da8f826b9d94fdb2fdae457bd07
    Test01.txt0936f74a58571dd87ad343cc3d6ae8434ad86fc4
    

    There is really a NUL byte between the file name and the filelog ID, but it's not visible in your terminal. You now have the full filelog ID for the first revision of the Test01.txt file.

    You also need to go from Test02.txt to Test01.txt. You can use hg log --follow and hg debugrename for this: use hg log to get the revisions concerning the file, and use hg debugrename to see what the file was renamed from in each step.