Search code examples
gitrebasegit-rebasegit-rewrite-history

How do I reduce the size of a bloated Git repo by non-interactively squashing all commits except for the most recent ones?


My Git repo has hundreds of gigabytes of data, say, database backups, so I'm trying to remove old, outdated backups, because they're making everything larger and slower. So I naturally need something that's fast; the faster, the better.

How do I squash (or just plain remove) all commits except for the most recent ones, and do so without having to manually squash each one in an interactive rebase? Specifically, I don't want to have to use

git rebase -i --root

For example, I have these commits:

A .. B .. C ... ... H .. I .. J .. K .. L

What I want is this (squashing everything in between A and H into A):

A .. H .. I .. J .. K .. L

Or even this would work fine:

H .. I .. J .. K .. L

There is an answer on how to squash all commits, but I want to keep some of the more recent commits. I don't want to squash the most recent commits either. (Especially I need to keep the first two commits counting from the top.)

(Edit, several years later. The right answer to this question is to use the right tool for the job. Git is not a very good tool to store backups, no matter how convenient it is. There are better tools.)


Solution

  • The original poster comments:

    if we take a snapshot of a commit 10004, remove all commits before it, and make commit 10004 a root commit, I'll be just fine

    One way to do this is here, assuming your current work is called branchname. I like to use a temp tag whenever I do a large rebase to double-check that there were no changes and to mark a point I can reset back to if something goes wrong (not sure if this is standard procedure or not but it works for me):

    git tag temp
    
    git checkout 10004
    git checkout --orphan new_root
    git commit -m "set new root 10004"
    
    git rebase --onto new_root 10004 branchname
    
    git diff temp   # verification that it worked with no changes
    git tag -d temp
    git branch -D new_root
    

    To get rid of the old branch you'll need to delete all tags and branch tags on it; then

    git prune
    git gc
    

    will clean it from your repo.

    Note that you'll temporarily have two copies of everything, until you have gc'd, but that is unavoidable; even if you do a standard squash and rebase you still have two copies of everything until the rebase finishes.