I have a local repository currently in development and I'd like to share (part of) it publicly on GitHub. What I've done so far:
git checkout dev # dev is the current development branch of my local repository
git branch public # create a new branch from dev for the public repo
git checkout public
git remote add public git@github.com # add the public repo as a new remote
git push -u public public:master # push local 'public' branch to 'master' branch of 'public' remote
This push failed, however, because my repository contains some fairly large subdirectories. So I set about cleaning it:
git rm -r --cached external # remove large subdirectory 'external'
git rm -r --cached ... # repeat for other large subdirectories
I then included all the above subdirectories in the .gitignore
as well and committed. A call to git ls
now shows only a small number of files, the combined size of which is at most a few MB, and a call to git status
shows no uncommitted or untracked files. However git push
still fails, apparently because the large subdirectories are still included in the branch's history.
The correct way to purge files from history seems to be to use the git filter-branch
command, however this command has a lot of warnings attached to it, and I don't want to mess up my entire repository in the process. How do I properly purge the subdirectories (and only the subdirectories) I removed above with git rm
from the history of the public
branch (and only the public
branch)?
Since the branch is unlikely to ever be merged back into the other branches, I'd be OK with simply removing all history from it as well, as a last resort. The other branches should still remain exactly as they are, however
In a sense branches don't really exist in git: they are just pointers to a particular commit, and from there to the history that led up to that commit. So your repository might look something like this, schematically:
+-- E --- F <- main branch
/
A --- B --- C --- D
\
+-- G --- H <- public branch
If the large files existed in any of commits A, B, C and D, then by definition they exist in the history of both the main and the public branch.
To rewrite history, you have to create new commits right back to when those files were first added. You can do this with the git-filter-repo tool like this:
git filter-repo --invert-paths --path '/directory/to/delete' --refs public
Let's assume the files were first added in commit B; we might now have something like this:
+-- B --- C --- D -- E --- F <- main branch
/
A
\
+-- B2 --- C2 --- D2 --- G2 --- H2 <- public branch
This appears to be what you want, but it's no longer very usable as a branch - if you ever tried to merge anything into it from main
, you'd end up with this:
+-- B --- C --- D -- E --- F ----- X <- main branch with new feature
/ \
A \
\ \
+-- B2 --- C2 --- D2 --- G2 --- H2 --- M <- public branch with merge commit
The original version of commit B, which contains our large files, is now back in the branch history, as well as the new commit B2.
So, rather than worrying about which branches do and don't contain the files, it might be easier to simply take a copy of the repository with a new name, and make it as though those files had never existed anywhere in the repository's history.
git filter-repo --invert-paths --path '/directory/to/delete'
This will rewrite all your commits, giving a completely new history:
+-- E2 --- F2 <- main branch
/
A2 --- B2 --- C2 --- D2
\
+-- G2 --- H2 <- public branch