Search code examples
gitversion-controlgit-svn

Avoiding (duplicate) commits with different hashes when dcomitting


Apologies for the long question; I thought it best to include as much information as possible.

Question

How can I keep a Gitlab-hosted repo in sync with an (authoritative) svn repo and avoid duplicating commits when using git svn dcommit?

Setup

I have a svn repository hosted on a local server. A remote team (which does not have access to this server) is using git to develop a sub-tree of the software in the repository. For business purposes, the svn repository is considered authoritative for the purposes of releases, etc. So, I am using git-svn to keep the teams synchronized.

Repository info

  • "Local" svn repo at svn://project
  • Gitlab instance running at gitlab.mydomain.com
  • I am the only user with both an svn and Gitlab account; git users do not have svn accounts (or access to the server) and vice versa

Consider the above arrangement unchangeable, so please don't suggest an alternative arrangement. I'm interested in answers that address my specific, lower-level question which I promise is coming shortly =)

Initial cloning

The repository is initially cloned from svn to Gitlab like so:

git svn clone --prefix=svn --preserve-empty-dirs svn://project gitclone
git remote add origin https://gitlab/me/project.git
git push --set-upstream origin master

The problem

When I develop a feature on a git branch and integrate it back to the svn repository, I end up with a lot of duplicate commits in the git commit history. The svn commit history looks fine. I've determined that this is because the commits back to svn created with git svn dcommit have different hashes than the 'original' commits on the git side of things. Here is an example flow:

git developer

$ git clone https://gitlab/me/project.git
$ git checkout -b git-f2
Switched to a new branch 'git-f2'

$ mkdir git-f2
$ touch git-f2/git-f2.txt
$ git add .
$ git commit -m "Add f2"
[git-f2 686a513] Add f2
 1 file changed, 0 insertions(+), 0 deletions(-)
 create mode 100644 git-f2/git-f2.txt

$ echo "some text" > f2/f2.txt
$ git commit -m "Update f2"
[git-f2 e84af9a] Update f2
 1 file changed, 1 insertion(+)

$ git checkout master
Switched to branch 'master'
Your branch is up-to-date with 'origin/master'.

$ git merge git-f2
Updating dc4d50b..e84af9a
Fast-forward
 git-f2/git-f2.txt | 1 +
 1 file changed, 1 insertion(+)
 create mode 100644 git-f2/git-f2.txt

$ git status
On branch master
Your branch is ahead of 'origin/master' by 2 commits.
  (use "git push" to publish your local commits)
nothing to commit, working tree clean

$ git push
Counting objects: 8, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (4/4), done.
Writing objects: 100% (8/8), 724 bytes | 0 bytes/s, done.
Total 8 (delta 0), reused 0 (delta 0)
To https://gitlab.mydomain.com/me/project.git
   dc4d50b..e84af9a  master -> master

Me (repo synchronizer)

$ git status
On branch master
Your branch is up-to-date with 'origin/master'.
nothing to commit, working tree clean

$ git fetch
remote: Counting objects: 8, done.
remote: Compressing objects: 100% (4/4), done.
remote: Total 8 (delta 0), reused 0 (delta 0)
Unpacking objects: 100% (8/8), done.
From https://gitlab.mydomain.com/me/project
   dc4d50b..e84af9a  master     -> origin/master

$ git status
On branch master
Your branch is behind 'origin/master' by 2 commits, and can be fast-forwarded.
  (use "git pull" to update your local branch)
nothing to commit, working tree clean

$ git pull
Updating dc4d50b..e84af9a
Fast-forward
 git-f2/git-f2.txt | 1 +
 1 file changed, 1 insertion(+)
 create mode 100644 git-f2/git-f2.txt

$ git status
On branch master
Your branch is up-to-date with 'origin/master'.
nothing to commit, working tree clean

At this point, git log shows the same commits on my (synchronizer) local master as are on the remote:

$ git log --format=oneline
commit e84af9ae738d782dfa5499cfb93b3dcb73cbf179 (HEAD -> master, origin/master)
commit 686a513eaf0083ad234e383f7e543df19431eff5
commit dc4d50bd66f36595d539c4f0c2ad70079c277315 (svn/git-svn)
commit 7e330320ac7d36331a8fb525f63fdf60f4ee070f

But then when I dcommit, commits 686a51 and e84af9 get reproduced on my local master with new hashes (ed7b23 and 605348, respectively):

$ git svn dcommit --use-log-author --add-author-from
Committing to svn://project ...
        A       git-f2/git-f2.txt
Committed r306
        A       git-f2/git-f2.txt
r306 = ed7b23e5abe29c09ff4483d811c1d645916e075b (refs/remotes/svn/git-svn)
        M       git-f2/git-f2.txt
Committed r307
        M       git-f2/git-f2.txt
r307 = 605348ec24142e2d382b295dbb34aa20c507fad9 (refs/remotes/svn/git-svn)
No changes between e84af9ae738d782dfa5499cfb93b3dcb73cbf179 and refs/remotes/svn/git-svn
Resetting to the latest refs/remotes/svn/git-svn

Now, I'm not tracking the Gitlab master anymore, and it's clear why:

$ git status
On branch master
Your branch and 'origin/master' have diverged,
and have 2 and 2 different commits each, respectively.
  (use "git pull" to merge the remote branch into yours)
nothing to commit, working tree clean

$ git log --graph --format=oneline
* 605348ec24142e2d382b295dbb34aa20c507fad9 (HEAD -> master, svn/git-svn) Update f2
* ed7b23e5abe29c09ff4483d811c1d645916e075b Add f2
* dc4d50bd66f36595d539c4f0c2ad70079c277315 <redacted>
* 7e330320ac7d36331a8fb525f63fdf60f4ee070f <redacted>

$ git status
On branch master
Your branch and 'origin/master' have diverged,
and have 2 and 2 different commits each, respectively.
  (use "git pull" to merge the remote branch into yours)
nothing to commit, working tree clean

At this point, doing a git pull results in merge commit being added to my local master. A subsequent git push leaves the following graph:

$ git log origin/master --graph --format=oneline
*   e3bdac6d4fd58fbd006d777ceb3e87d31ee14ace (HEAD -> master, origin/master) GIT PULL Merge branch 'master' of https://gitlab.mydomain.com/me/project
|\
| * e84af9ae738d782dfa5499cfb93b3dcb73cbf179 Update f2
| * 686a513eaf0083ad234e383f7e543df19431eff5 Add f2
* | 605348ec24142e2d382b295dbb34aa20c507fad9 (svn/git-svn) Update f2
* | ed7b23e5abe29c09ff4483d811c1d645916e075b Add f2
|/
* dc4d50bd66f36595d539c4f0c2ad70079c277315 <redacted>
* 7e330320ac7d36331a8fb525f63fdf60f4ee070f <redacted>

The svn log and history is just fine, so I'm wondering if there is any way to avoid the 'new' commits being made on my local master when I do the dcommit. This seems like it would avoid the issue I'm seeing here. Or, since I'm a git novice, I may be completely wrong!


Solution

  • In short, this is the mechanism for git svn dcommit: dcommit revision to SVN repo based on git commits and then rewrite the git commits.

    There are the git svn dcommit explanation as below:

    • Commit each diff from the current branch directly to the SVN repository, and then rebase or reset (depending on whether or not there is a diff between SVN and head). This will create a revision in SVN for each commit in Git. The command will dcommit changes to svn repo based on the git commits

    From git svn document

    • This takes all the commits you’ve made on top of the Subversion server code, does a Subversion commit for each, and then rewrites your local Git commit to include a unique identifier. This is important because it means that all the SHA-1 checksums for your commits change.

    From Committing Back to Subversion

    And we can illustrate by below graphs:

    After pulling changes from git remote repo in your repo synchronizer and before git svn dcommit, assume the commit history in the repo as below:

    …---A---B---C---D---E  master, origin/master
                |
            svn/git-svn
    

    When you executing git svn dcommit, it will create revisions based on the new commit D and E which after remotes/svn/git-svn. And it will also rewrite the commit D and E with new commit sha-1 values (as commit D' and E' in below graph). So after executing the command, the commit history will be:

                  D'---E'  master
                 /
    …---A---B---C---D---E  origin/master
                |
            svn/git-svn
    

    So git status will show Your branch and 'origin/master' have diverged.

    Since you do not want to change the arrangement, you can force push to GitLab repo in your repo synchronizer by git push -f origin master. Then the commit history will be:

    …---A---B---C---D'---E'  master, origin/master
                |
            svn/git-svn