Search code examples
gitbfg-repo-cleaner

How to find the git object id of an object with a known hash


I am using bfg to clean my git repo. To get the list of big files to delete, I use this script. However for some files I only want to delete specific versions of them from the repo.

bfg has the option to "strip blobs with the specified Git object ids". When I run the above script, I am given a hash for each object in the list. Given that hash, how can I find out the git object id of that specific object so that I can delete it with bfg?


Solution

  • That script appears to list the git object id already.

    If you have a particular commit you are interested to clean, you can use the command line "Which commit has this blob?" to check if a particular object id is part of said commit.

    git log --all --pretty=format:%H -- <path> | \
     xargs -n1 -I% sh -c "git ls-tree % <path> | \
     grep -q <hash> && echo %"
    

    For instance, in my repo seec:

    a255b5c1d469591037e4eacd0d7f4599febf2574 12884 seec.go
    a7320d8c0c3c38d1a40c63a873765e31504947ff 12928 seec.go
    

    I want to clean the a7320d8 version of seec.go;

    As seen in BFG commit 12d1b00:

    People can get a list of blob-ids using "git rev-list --all --objects", then grep to list all files in directories they want to nuke, and pass that to the BFG.

    Note: the bi test reads:

    val blobIdsFile = Path.createTempFile()
    blobIdsFile.writeStrings(badBlobs.map(_.name()),"\n")
    run(s"--strip-blobs-with-ids ${blobIdsFile.path}")
    

    Meaning the parameter to -bi is a file, with the blob id(s) in it.


    I can also check what I just got is indeed the blob id by looking for its commit:

    vonc@bvonc MINGW64 ~/data/git/seec (master)
    $ git log --all --pretty=format:%H -- seec.go | xargs -n1 -I% sh -c "git ls-tree % seec.go|\
    grep -q a7320d8 && echo %"
    

    I get: commit c084402.

    Let's see if that commit does actually include the seec.go revision blob id a7320d8 (using "Git - finding the SHA1 of an individual file in the index").
    I can find the blob id of a file from a GitHub commit:

    vonc@bvonc MINGW64 ~/data/git/seec (master)
    $ (echo -ne "blob $(curl -s https://raw.githubusercontent.com/VonC/seec/c084402/seec.go --stderr -|wc -c)\0"; \
       curl -s https://raw.githubusercontent.com/VonC/seec/c084402/seec.go --stderr -) | \
      sha1sum | awk '{ print $1 }'
    a7320d8c0c3c38d1a40c63a873765e31504947ff
    

    Bingo.

    Should I want to strip out seec.go blob id a7320d8, I know I can pass to bfg that blob id (in a "blob ids" file).