How to "rebase tags" in git?

Suppose I have the following simple git repository: a single branch, some commits one after another, a couple of them having been tagged (with annotated tags) after committing each of them, and then one day I decide I want to change the first commit (which, by the way, is not tagged, if that changes anything). So I run git rebase --interactive --root and just mark 'edit' for the initial commit, change something in it and git rebase --continue. Now all commits in my repository have been recreated, therefore their sha1's have changed. However, the tags I created are completely unchanged, still pointing to the sha1 of the previous commits.

Is there an automatic way of updating the tags to the correspoiding commits created when rebasing?

Some people suggest using git filter-branch --tag-name-filter cat -- --tags but that first warns me that each of my tags are unchanged and then says that each of my tags are changed to themselves (same tag name and same commit hash). And still, git show --tags says that the tags still point to the old commits.

Solution

In one sense, it's too late (but hang on, there's good news). The filter-branch code is able to adjust the tags because it keeps, during its filtering, a mapping of old-sha1 to new-sha1.

In fact, both filter-branch and rebase use the same basic idea, which is that each commit is copied, by expanding the original contents, making any desired changes, and then making a new commit out of the result. This means that during each copy step it's trivial to write the <old-sha1, new-sha1> pair to a file, and then once you're done, you fix up references by looking up the new-sha1 from their old-sha1. Once all the references are done, you're committed to the new numbering and you remove the mapping.

The map is gone by now, hence "in one sense, it's too late".

Luckily, it's not too late. :-) Your rebase is repeatable, or at least, the key parts of it probably are. Moreover, if your rebase was simple enough, you might not need to repeat it at all.

Let's look at the "repeat" thought. We have an original graph G of some arbitrary shape:

     o--o
    /    \
o--o--o---o--o   <-- branch-tip
 \          /
  o--o--o--o

(whoa, a flying saucer!). We've done a git rebase --root on (some part of) it, copying (some or all) commits (preserving merges or not) to get some new graph G':

    o--o--o--o   <-- branch-tip
   /
  /  o--o
 /  /    \
o--o--o---o--o
 \          /
  o--o--o--o

I've drawn this sharing only the original root node (and now it's a sailboat with a crane on it, instead of a flying saucer). There might be more sharing, or less. Some of the old nodes may have become completely unreferenced and hence been garbage-collected (probably not: the reflogs should keep all the original nodes alive for at least 30 days). But in any case, we still have tags pointing into some "old G part" of G', and those references guarantee that those nodes, and all their parents, are still in the new G'.

Thus, if we know how the original rebase was done, we can repeat it on the sub-graph of G' that is the important part of G. How hard or easy this is, and what command(s) to use to do it, depend on whether all of the original G is in G', what the rebase command was, how much G' overlays the original G, and more (since git rev-list, which is our key to getting a list of nodes, probably has no way to distinguish between "original, was-in-G" and "new to G'" nodes). But it probably can be done: it's just a Small Matter Of Programming, at this point.

If you do repeat it, this time you'd want to keep the mapping, especially if the resulting graph G'' doesn't completely overlap G', because what you need now is not the map itself, but a projection of this map, from G into G'.

We simply give each node in the original G a unique relative address (e.g., "from the tip, find parent commit #2; from that commit, find parent commit #1; from that commit...") and then find the corresponding relative address in G''. This allows us to rebuild the critical parts of the map.

Depending on the simplicity of the original rebase, we might be able to jump directly to this phase. For instance, if we know for sure that the entire graph was copied without flattening (so that we have two independent flying saucers) then the relative address for tag T in G is the relative address we want in G', and now it's trivial to use that relative address to make a new tag pointing to the copied commit.

Big update based on new information

Using the additional information that the original graph was completely linear, and that we've copied every commit, we can use a very simple strategy. We still need to reconstruct the map, but now it's easy, as every old commit has exactly one new commit, which has some linear distance (which is easy to represent as a single number) from either end of the original graph (I'll use distance-from-tip).

That is, the old graph looks like this, with just one branch:

A <- B <- C ... <- Z   <-- master

The tags simply point to one of the commits (via an annotated tag object), e.g., perhaps tag foo points to an annotated-tag object that points to commit W. We then note that W is four commits back from Z.

The new graph looks exactly the same except that each commit has been replaced with its copy. Let's call these A', B', and so on, through Z'. The (single) branch points to the tip-most commit, i.e., Z'. We'll want to adjust the original tag foo so that we have a new annotated-tag object pointing to W'.

We'll need the SHA-1 ID of the original tip-most commit. This should be easy to find in the reflog for the (single) branch, and is probably simply master@{1} (although that depends on how many times you have tweaked the branch since then; and if there are new commits you added since rebasing, we need to take those into account as well). It may well also be in the special ref ORIG_HEAD, which git rebase leaves behind in case you decide you don't like the rebase result.

Let's assume that master@{1} is the correct ID and that there are no such new commits. Then:

orig_master=$(git rev-parse master@{1})

would save this ID in $orig_master.

If we wanted to build the full map, this would do it:

$ git rev-list $orig_master > /tmp/orig_list
$ git rev-list master > /tmp/new_list
$ wc -l /tmp/orig_list /tmp/new_list

(the output for both files should be the same; if not, some assumption here has gone wrong; meanwhile I'll leave out shell $ prefix too, below, since the rest of this really should go into a script, even for one-time use, in case of typos and need for tweaks)

exec 3 < /tmp/orig_list 4 < /tmp/new_list
while read orig_id; do
    read new_id <& 4; echo $orig_id $new_id;
done <& 3 > /tmp/mapping

(this, quite untested, is meant to paste the two files together—sort of a shell version of Python zip on the two lists—to get the mapping). But we don't actually need the mapping, all we need is those "distance from tip" counts, so I'm going to pretend we didn't bother here.

Now we need to iterate over all tags:

# We don't want a pipe here because it's
# not clear what happens if we update an existing
# tag while `git for-each-ref` is still running.
git for-each-ref refs/tags > /tmp/all-tags

# it's also probably a good idea to copy these
# into a refs/original/refs/tags name space, a la
# git filter-branch.
while read sha1 objtype tagname; do
    git update-ref -m backup refs/original/$tagname $sha1
done < /tmp/all-tags

# now replace the old tags with new ones.
# it's easy to handle lightweight tags too.
while read sha1 objtype tagname; do
    case $objtype in
    tag) adj_anno_tag $sha1 $tagname;;
    commit) adj_lightweight_tag $sha1 $tagname;;
    *) echo "error: shouldn't have objtype=$objtype";;
    esac
done < /tmp/all-tags

We still need to write the two adj_anno_tag and adj_lightweight_tag shell functions. First, though, let's write a shell function that produces the new ID given the old ID, i.e., looks up the mapping. If we used a real mapping file, we would grep or awk for the first entry, then print the second. Using the sleazy single-old-file method, though, what we want is the line number of the matching ID, which we can get with grep -n:

map_sha1() {
    local grep_result line

    grep_result=$(grep -n $1 /tmp/orig_list) || {
        echo "WARNING: ID $1 is not mapped" 1>&2
        echo $1
        return 1
    }
    # annoyingly, grep produces "4:matched-text"
    # on a match.  strip off the part we don't want.
    line=${grep_result%%:*}
    # now just get git to spit out the ID of the (line - 1)'th
    # commit before the tip of the current master.  the "minus
    # one" part is because line 1 represents master~0, line 2
    # is master~1, and so on.
    git rev-parse master~$((line - 1))
}

The WARNING case should never happen, and the rev-parse should never fail, but we probably should check the return status of this shell function.

The lightweight tag updater is now pretty trivial:

adj_lightweight_tag() {
    local old_sha1=$1 new_sha1 tag=$2

    new_sha1=$(map_sha1 $old_sha1) || return
    git update-ref -m remap $tag $new_sha1 $old_sha1
}

Updating an annotated tag is more difficult, but we can steal code from git filter-branch. I'm not going to quote it all here; instead, I just give you this bit:

$ vim $(git --exec-path)/git-filter-branch

and these instructions: search for the second occurrence of git for-each-ref, and note the git cat-file piped to sed with the result passed to git mktag, which sets the shell variable new_sha1.

This is what we need to copy the tag object. The new copy must point to the object found by using $(map_sha1) on the commit to which the old tag pointed. We can find that commit the same way filter-branch does, using git rev-parse $old_sha1^{commit}.

(Incidentally, writing up this answer and looking at the filter-branch script, it occurs to me that there's a bug in filter-branch, which we'll import to our post-rebase tag-fixup code: if an existing annotated tag points to another tag, we don't fix it. We only fix lightweight tags and tags pointing directly to commits.)

Note that none of the example code above is actually tested, and turning it into a more-general-purpose script (that could be run after any rebase, for instance, or better yet, incorporated into interactive rebase itself) requires a fair amount of additional work.