Search code examples
svngitgit-svn

How do I prepend history to a Git repository?


I have a project that has existed in two SVN repositories. The second SVN repository was created simply by adding the repositories from a checkout of the old SVN repository without SCM information stripped. The content of the files are byte identical, but there is no associated SCM meta-data.

I have taken the new SVN repository and ported it into a Git repository via git-svn. Now I would like to import the old repository and somehow get it to link the new repository so I can see the history across both. Is there a simple way to do this without hand stitching the two repositories together?


Solution

  • See also: the How do I re-play my commits of a local Git repository, on top of a project I forked on github.com? question (and my answer there), although the situation is slightly different, I think.


    You have at least three possibilities:

    • Use grafts to join two histories, but do not rewrite history. This means that you (and anybody who has the same grafts) would have full history, while other users would have a smaller repository. This also avoids problems with rewritten history if somebody already started working on top of the converted repository with a shorter history.

    • Use grafts to join two histories, and check that it is correct using "git log" or "gitk" (or other Git history browser/viewer), then rewrite history using git filter-branch; then you can remove the grafts file. This means that everybody who clones (fetches) from a rewritten repository would get the full, joined history. But rewriting history is a big no if somebody already based work on converted short-history repository (but this case might not apply to you).

    • Use git replace to join two histories. This would allow people to select whether they want full history, or just current history, by choosing to fetch refs/replace/ (then they get full history) or not (then they get short history). Unfortunately this requires currently to use a yet unreleased version of Git, using the development ('master') version, or one of the release candidates for 1.6.5. The refs/replace/ hierarchy is planned for the upcoming Git version 1.6.5.


    Below there are step-by-step instructions for all those methods: grafts (local), rewriting history using grafts, and refs/replace/.

    In all cases I assume that you have both the current and historical repository history in a single repository (you can add history from another repository using git remote add). I also assume that (one of) the branches in the short-history repository is named 'master', and that the branch (commit) of the historical repository where you want to attach current history is called 'history'. You would have to substitute your own branch names (or commit IDs).

    Finding commit to attach (root of short history)

    First, you have to find the (SHA-1 identifier of) commit in short-history that you want to attach to the full history. It would be the first commit in the short history, i.e. the root commit (the commit without any parents).

    There are two ways of finding it. If you are sure that you do not have any other root commit, you can find the last (bottommost) commit in topological order, using:

    $ git rev-list --topo-order master | tail -n 1
    

    (where tail -n 1 is used to get the last line of the output; you don't need to use it if you don't have it.)

    If there is possibility of multiple root commits, you can find all parentless commits using the following one-liner:

    $ git rev-list --parents master | grep -v ' '
    

    (where grep -v ' ', that is, space between single quotes, is used to filter out all commits which have any parents). Then you have to check (using e.g. "git show <commit>") those commits if there are more than one, and select one that you want to attach to earlier history.

    Let's call this commit TAIL. You can save it in a shell variable using (assuming that simpler method works for you):

    $ TAIL=$(git rev-list --topo-order master | tail -n 1)
    

    In the description below I would use $TAIL to mean that you have to substitute the SHA-1 of the bottommost commit in the current (short) history... or allow the shell to do the substitution for you.

    Finding a commit to attach to (top of the historical repository)

    This part is simple. We have to the convert the symbolical name of the commit into an SHA-1 identifier. We can do this using "git rev-parse":

    $ git rev-parse --verify history^0
    

    (where 'history^0' is used in place of 'history' just in case if 'history' is a tag; we need the SHA-1 of the commit, not of a tag object). Similarly, like finding a commit to attach, let's name this commit ID TOP. You can save it in a shell variable using:

    $ TOP=$(git rev-parse --verify history^0)
    

    Joining history using a grafts file

    The grafts file, located in .git/info/grafts (you need to create this file if it doesn't exist, if you want to use this mechanism) is used to replace the parent information for a commit. It is line-based format, where each line contains the SHA-1 of a commit we want to modify, followed by zero or more space-separated lists of commits we want for given commit to have as parents; the same format that "git rev-list --parents <revision>" outputs.

    We want $TAIL commit, which doesn't have any parents, to have $TOP as its single parent. So in the info/grafts file there should be a line with the SHA-1 of the $TAIL commit, separated by space by the SHA-1 of the $TOP commit. You can use the following one-liner for this (see also examples in git filter-branch documentation):

    $ echo "$TAIL $TOP" >> .git/info/grafts
    

    Now you should check, using "git log", "git log --graph", "gitk" or other history browser that you joined histories correctly.

    Rewriting history according to the grafts file

    Please note that this would change history!

    To make history as recorded in grafts file permanent, it is enough to use "git filter-branch" to rewrite the branches you need. If there is only a single branch that needs to be rewritten ('master'), it can be as simple as:

    $ git filter-branch $TOP..master
    

    (This would process only minimal set of commits). If there are more branches affected by joining history, you can simply use

    $ git filter-branch --all
    

    Now you can delete the grafts file. Check if everything is like you wanted, and remove backup in refs/original/ (see documentation for "git filter-branch" for details).

    Using refs/replace/ mechanism

    This is an alternative to the grafts file. It has the advantage that it is transferable, so if you published the short history and cannot rewrite it (because other based their work on the short history), then using refs/replace/ might be a good solution... well, at least when Git version 1.6.5 gets released.

    The refs/replace/ mechanism operates differently than a grafts file: instead of modifying the parent's information, you replace objects. So first you have to create a commit object which has the same properties as $TAIL, but has $TOP as a parent.

    We can use

    $ git cat-file commit $TAIL > TAIL_COMMIT
    

    (The name of temporary file is only an example).

    Now you need to edit 'TAIL_COMMIT' file (it would look like this):

    tree 2b5bfdf7798569e0b59b16eb9602d5fa572d6038
    author Joe R Hacker  1112911993 -0700
    committer Joe R Hacker  1112911993 -0700
    
    Initial revision of "project", after moving to new repository
    

    Now you need to add $TOP as parent, by putting a line with "parent $TOP" (where $TOP has to be expanded to SHA-1 id!) between 'tree' header and 'author' header. After editing 'TAIL_COMMIT' it should look like this:

    tree 2b5bfdf7798569e0b59b16eb9602d5fa572d6038
    parent 0f6592e3c2f2fe01f7b717618e570ad8dff0bbb1
    author Joe R Hacker  1112911993 -0700
    committer Joe R Hacker  1112911993 -0700
    
    Initial revision of "project", after moving to new repository
    

    If you want, you can edit the commit message.

    Now you need to use git hash-object to create a new commit in the repository. You need to save the result of this command, which is the SHA-1 of a new commit object, for example like this:

    $ NEW_TAIL=$(git hash-object -t commit -w TAIL_COMMIT)
    

    (Where the '-w' option is here to actually write the object to the repository).

    Finally use git replace to replace $TAIL by $NEW_TAIL:

    $ git replace $TAIL $NEW_TAIL
    

    Now what is left to check (using "git log" or some other history viewer) if the history is correct.

    Now anybody who wants to have the full history needs to add '+refs/replace/*:refs/replace/*' as one of pull refspecs.

    Final note: I have not checked this solution, so your mileage may vary.