Search code examples
gitgithubgit-svn

SVN to Git: Keep only trunk content with "basic" history


Background:

I am migrating to Git a large SVN repository with 40.000 revision and 20+ GB of data. I have got my repository fetched from SVN by running git svn fetch with the following .git/config settings:

[svn-remote "svn"]
ignore-paths = ^[^/]+/(?:branches|tags)  <--- note ignoring tags and branches
url = https://svn_server/repos/my_repo
fetch = :refs/remotes/git-svn

As it can be noticed by the config settings above, branches and tags have been ignored as I just want to migrate the contents of trunk. Git-svn fetch also retrieved branches and tags directories to keep merging history.

At this point the remotes/git-svn branch contains:

repo/
--branches
--tags
--trunk

Goal:

What I want is to just have in my Git repository the contents of trunk, removing branches and tags, and keeping only history of existing files as I have no need to revert back to any branch, and I don`t need to see or revert any deleted file.

My first attempt was to rewrite history removing branches folder with the following command:

git filter-branch --index-filter "git rm -rf --cached --ignore-unmatch branches"

After about 48 hours running I killed the process. I know I have a big amount of data, but this amount of time seems unreasonable to me so I guess I was not in the right direction.

By keeping only history of existing files I believe I could reduce my repository size from 20 GB to less than 1 GB, and then be able to upload it to Github.

Question:

Is there a way to clone only trunk contents to a new Git repository and keep only history of files in trunk with no reference to removed files or removed branches?


Solution

  • Well, just clone the trunk and only the trunk:

    git svn clone http://svn_server/repos/my_repo/trunk
    

    Note that I point directly to trunk and do not use the -s option.