Search code examples
gitgit-plumbing

Git: get blob path in .git directory


I want to access some blobs at some point in repo history.

Currently, I do that with git show $REV:$PATH. But files are quite large and I don't want them to be read and piped in script. I want to get their path and then read as plain files.

May I rely on layout described in current documentation (e.g. .git/objects/ee/2403ffd236587a2b17ddc35b0e711fc99ba6a0), get file hash and transform it into path manually? I mean that it will not change soon in future versions and objects directory always has this structure. Is there simpler way to do that with some plumbing command?


Solution

  • While the blob data is inviolable and sacrosanct, it's also in a format unusable to mere mortals:

    • As j6t said in a comment, it's zlib-deflated (but this is an implementation detail, not a promise, i.e., you are not supposed to just open and read it and use a zlib inflator to restore it, you are supposed to let Git do that for you).

    • As Leon said in a comment, it may have been packed, in which case there is no unpacked object file to open and read in the first place. Instead, you would have to open the pack index files (to find the correct pack file) and then the correct pack file (to find the packing data with the directory that locates the object and its bases), and then undo the xdelta style, but not actually xdelta, compression of those items.

    If you want to read the file with plumbing commands, you could first find the hash:

    $ git rev-parse HEAD~20:Makefile
    bdb55792f11a9f9565c4aad147a492caed7f09c3
    

    and then use git cat-file -p to extract the raw object, or git cat-file -t to get its type (or --batch-check to read information about the object, etc.). Note that you can in fact just pass the path directly to git cat-file itself as well:

    $ git cat-file -t HEAD~20:Makefile
    blob
    

    Note, however, one more potential stumbling block: when accessing the contents of a blob with either git cat-file -p <blob-specifier> or git show <blob-specifier>, you get the in-repository format of the data. That is, when checking out a specific commit (with git checkout), Git will extract a .gitattributes file and/or use the git config settings to find smudge filters and/or CR-LF adjustments that are to be made. These filters are applied to the in-repository data to produce the working-tree copy of the file. But when you use git show or git cat-file -p to access the repository data, no filters are used.