Following is a situation.
Project source code hosted on a some git service provider (e.g Bitbucket). - Size over 1GB We migrated all the work to new git service provider with some pruning to delete old large files and objects (e.g Github) - Size 500MB.
It has been few weeks since the transition. All of a sudden now the repo size is over 1.8GB, and behold we have some old objects that were deleted as part of the old repo.
Now how do I find the commit/push that caused this? I know when it happened after but can't pinpoint the commit or the branch that might be causing this. Also is there an easier way to revert the push to get the repo size back to normal ?
Another question would be, how can I prevent these object being pushed back again by accident?
My search landed me on following SO relevant answers but came back empty handed.
refs:
Git is very much oriented to the idea of adding new things (commits and their underlying objects) to the database, without ever removing any old things.
When you do manage to remove some old thing(s), if Git ever encounters them again, it sees them as new things and adds them back in. You can, if you like, think of this as getting "re-infected". Every copy of the repository that has the "infection" is "contagious", and touching any of them (via git fetch
or git push
) can bring back the objects you thought you had gotten rid of.
Now how do I find the commit/push that caused this?
Finding a particular fetch or push that caused it is difficult-to-impossible. Finding the commit(s) that contain the large objects is possible; see the answer you linked, and other links within it.
Also is there an easier way to revert the push to get the repo size back to normal?
You must ditch the commit(s) that contain the large objects, and if there are later commits that you wish to retain that depend on those earlier commits, copy the later commits to new, different commits that no longer depend on the earlier commits. This is what git filter-branch
does. Once you have no branch tip that either point to, or have in their commit ancestry chain, the commits that have the large objects, you can re-pack and shrink the repository.
The BFG Cleaner is much easier to use (it does all this for you), but I have never used it.
... how can I prevent these object being pushed back again by accident?
This is trickier. There are a number of approaches that work to varying degrees: