Search code examples
gitgit-svn

Exclude files from git svn clone


I'm migrating an SVN repo to Git and I have 7000+ binary files I would like to exclude from being imported and becoming part of the Git history from the start, as opposed to cleaning them up after (ref this question). The location of the files doesn't follow a very regular pattern so I'd have to supply a rather long list of locations to git, and I have ~8000 commits to take into consideration.

If my goal is to avoid bloating the repo with unnecessary files, what is the best approach to do that ?

Is there a way I can exclude these from the start, perhaps as a flag to git svn clone ? Would adding them to a .gitignore before clone prevent them from being added ?

The other option would be to import all, then rewrite the whole history with git filter-branch to remove all those files before sharing the repo with others.


Solution

  • Based on your question and comments I don't think there is a way to simply clone without the certain files you don't want.

    I believe that just placing the files into the gitingore file won't make any difference to the git clone you do to the repo.

    However, on the server you may be able to create a filtered branch that doesn't have these files that you can pull from, as one of the answers on this question suggest suggests for their similar problem:

    On the server:

    git checkout master^0    # the ^0 checks out the commit itself, not the branch
    git filter-branch --tree-filter 'git rm -r wp-content/uploads' HEAD
    git checkout -b filtered
    

    (filter-branch on a big project here generates new history at about 2-3 commits per second)

    Then, anywhere you like,

    git init
    git remote add gimme your://repo/path
    git fetch gimme filtered
    

    As the documentation says, the filter-branch command can be useful for the following, which seems to nicely include the situation you're in:

    Those filters can modify each tree (e.g. removing a file or running a perl rewrite on all files) or information about each commit. Otherwise, all information (including original commit times or merge information) will be preserved.


    Edit: This has the additional bonus that if you want to pull from this repo in the future to additional places, then it's much simpler because it's a one time fix that you apply to the original repo rather than something done for each individual clone.