Search code examples
gitgit-submodulesgit-filter-branchgit-history-rewrite

Git filter-branch or filter-repo to update submodule gitlink?


I have git repository A that uses B as a submodule.

B's history has been rewritten after an LFS migration, but I would love it if A could still have its entire history functional. After the LFS migration, I do have a mapping OldSHA1 > NewSHA1 for submodule B, and now I just want to rewrite OldSHA1 gitlinks to NewSHA1 in repo A.

I have tried to run a filter-repo command on the repo A with a full OldSHA1==>NewSHA1 mapping as parameter but it doesn't seem to pick up gitlinks.

I also tried filter-branch as detailed in this thread Repository with submodules after rewriting history of submodule that seems to be looking for the exact thing I am trying to accomplish. I tried doing this with a single OldSHA1=>NewSHA1 mapping, and here's the command I am trying to run:

git filter-branch --commit-filter '
  if [ "$GIT_COMMIT" = <OLDSHA1> ];
  then
    cd <SUBMODULE_ABSOLUTE_PATH>;
    git checkout <NEWSHA1>;
    cd ..;
    git add -u;
    git commit -m "updated gitlink";
  else
    git commit-tree "$@";
  fi' HEAD 

But I keep getting the following error:

fatal: reference is not a tree: <NEWSHA1>

Somehow, git checkout doesn't seem to pick up the tree of submodule B. I even tried to specify a path with git -C AbsolutePathToSubModule checkout but I get the same error.

So, a few questions:

  • Is there something obvious I'm doing wrong here?
  • Is there a better way of accomplishing this? It seems like I "simply" want to replace a string with another somewhere in the object database, but I can't find a simple way to do that
  • Is there a way to do this on the entire repo like filter-repo does? Or should I run this on every single branch.

Thanks for any help, advice, clue about how to accomplish this!

Edit 1:

After an answer in the comments, I edited my script to this:

git filter-branch --commit-filter '
  if [ "$GIT_COMMIT" = <SpecificCommitID> ];
  then
    git update-index --add --cacheinfo 160000,<SpecificNewSha1>,<SubmodulePath>;
  fi
  git commit-tree "$@";
  ' HEAD

But it has no effect :(

WARNING: Ref 'refs/heads/develop' is unchanged

Edit 2:

Thanks a lot to user @torek! This is a snippet to help anyone get started:

git filter-branch --index-filter '
if [ "$(git rev-parse --quiet --verify :<SUBMODULEPATH>)" = <OLDSHA1> ];
then
  git update-index --cacheinfo 160000,<NEWSHA1>,<SUBMODULEPATH>;
fi' HEAD --all

From then, you have to loop over all OLDSHA1/NEWSHA1 pairs, or use a case) dictionary as depicted in their answer below

Thanks again a lot!


Solution

  • This:

    git filter-branch --commit-filter '
      if [ "$GIT_COMMIT" = <SpecificCommitID> ];
      then
        git update-index --add --cacheinfo 160000,<SpecificNewSha1>,<SubmodulePath>;
      fi
      git commit-tree "$@";
      ' HEAD
    

    is not what you want as it tests the hash ID of the superproject commit. You need to test the hash ID of the submodule commit in the index entry, e.g.:

    if [ "$(git rev-parse --quiet --verify :SubmodulePath)" = oldhash ]; then ...; fi
    

    and of course that has to test all the old rewritten submodule hash IDs to run them through the mapping function.

    (This will definitely be easier in filter-repo where you can use a dictionary lookup.)


    If you use:

    sm_hash=$(git rev-parse :submodule-path)
    

    or similar to prefix the test, remember to account for the cases where the submodule path is absent from the index so that :submodule-path does not parse properly. I think --quiet --verify will do the right thing here (produce no ouput quietly) but it's worth testing first.

    Once you have the hash, you can do:

    case $sm_hash in
    old1) new=new1;;
    old2) new=new2;;
    ...
    oldN) new=newN;;
    *) new=$sm_hash
    esac
    

    as a poor man's dictionary lookup with default, but you will want to skip updating the submodule hash if it's unchanged-or-empty.