Search code examples
gitgit-filter-branch

Git split subproject with history of all branches, repository too big


I originally had a SVN repository with multiples projects in it.

I converted it to a GIT repository using SubGit.

The repository:

MyRepository/
    .git/
    Project1/
    Project2/
    Project3/
    ...

This repository has multiple branches (v1, v2, v3, v4, v5, master).

I'm trying to split my repository into multiple repositories, keeping the full history and branches in each repository.

I was able to do it using this script https://stackoverflow.com/a/26033230/2558653

#!/bin/bash

repoDir="C:\Sources\MyRepository"
folder="Project1"
newOrigin="https://gitlab.com/myUser/project1.git"

cd $repoDir

git checkout --detach
git branch | grep --invert-match "*" | xargs git branch -D

for remote in `git branch --remotes | grep --invert-match "\->"`
do
        git checkout --track $remote
        git add -vA *
        git commit -vam "Changes from $remote" || true
done

git remote remove origin
git filter-branch --prune-empty --subdirectory-filter $folder -- --all

#prune old objects
rm -rf .git/refs/original/*
git reflog expire --all --expire-unreachable=0
git repack -A -d
git prune

#upload to new remote
git remote add origin $newOrigin
git push origin master

for branch in `git branch | grep -v '\*'`
do
        git push origin $branch
done

It worked, but now the repository of my subfolder is 2,7GB, while the current files are only 20 MB.

I tried this command to list large files in the repository , and I remarked that some files are not in my subproject and should have been removed :

git rev-list --objects --all \
  | grep "$(git verify-pack -v .git/objects/pack/*.idx \
           | sort -k 3 -n \
           | tail -10 \
           | awk '{print$1}')"

... 
00a2e4e398bd1805ad2524d86276ee72216c1f67 OtherFolder/Distribution/NsiScripts/file.exe
...

Is there a way to way to adapt the script to remove all files that are not in my subfolder and reduce the size ? Is that what the script was supposed to do?


Solution

  • Finally, the script I used was working, but I was taking all branches. And my "Project2" folder was not in some old branches, So it kept all the folders for these branches...

    So I had to take only the branches where the "Project2" exists

    for remote in `git branch --remotes | grep --invert-match "\->\|origin/v1\|origin/v2\|origin/v3"`
    

    And if someone try to keep mutliple folders when spliting, it can be done using :

    git filter-branch --index-filter 'git rm --ignore-unmatch --cached -qr -- . && git reset -q $GIT_COMMIT -- Project1/ Project2/' --prune-empty -- --all